PRINCIPLES FOR THE VALIDATION AND USE OF PERSONNEL SELECTION PROCEDURES: SECOND EDITION

Document Type: 
Collection: 
Document Number (FOIA) /ESDN (CREST): 
CIA-RDP00-01458R000100110001-9
Release Decision: 
RIFPUB
Original Classification: 
K
Document Page Count: 
34
Document Creation Date: 
December 12, 2016
Document Release Date: 
October 11, 2001
Sequence Number: 
1
Case Number: 
Publication Date: 
January 1, 1980
Content Type: 
REPORT
File: 
AttachmentSize
PDF icon CIA-RDP00-01458R000100110001-9.pdf2.25 MB
Body: 
PRINCIPLES FOR THE: VALIDATION AND USE OF EBSONNEL SELECTION OCEDURES: -'E -~ONC) EDITION Approved For Release 2001/11/07 : CIA-RDP00-01458R000100110001-9 This document is an official policy statement of the Division of Industrial- Organizational Psychology, American Psychological Association. It does not, however, necessarily represent the policy of the Association. Copies are available from the Secretary-Treasurer of the Division. The price schedule is: $4.00 each for 1-9 copies $2.50 each for 10-49 copies $2.00 each for 50 copies and up Orders should be sent to: Dr. Lewis E. Albright Kaiser Aluminum & Chemical Corporation 300 Lakeside Drive-Room KB 2140 Oakland, California 94643 Published by The Industrial-Organizational Psychologist, Berkeley, Cali- fornia, and Printed by the University of California Printing Department, Berkeley, California. Copyright Division of Industrial-Organizational Psychology 1980 Citation: American Psychological Association, Division of Industrial-Organiza- tional Psychology. Principles for the validation and use of personnel selection procedures. (Second edition) Berkeley, CA: Author, 1980. Approved For Release 2001/11/07 : CIA-RDPOO-01458R000100110001-9 Approved For Release 2001/11/07 : CIA-RDP00-01458R000100110001-9 Principles for the Validation and Use of Personnel Selection Procedures: Second Edition Division of Industrial-Organizational Psychology American Psychological Association 1980 Approved For Release 2001/11/07 : CIA-RDPOO-01458R000100110001-9 Approved For Release 2001/11/07 : CIA-RDP00-01458R000100110001-9 Foreword At the August 1978 meeting of the Division 14 Executive Committee the president, C. Paul Sparks, was instructed sto appoint editors and an advisory panel to revise and update the Principlhs for the Validation and Use of Personnel Selection Procedures (1975). The reasoning behind this instruction included both the increased attention to teats and testing during the 1975-1978 period and a forecast for even greater attention in the future. This document resulted. William A. Owens, .Ir. and Mary L. Tenopyr accepted an invitation to serve as co-editors for the revision. Twenty-six 'Division members were invited to serve on the advisory panel. Twenty-five accepted (one later withdrew for per- sonal reasons). The revision process was begun with a request that the advisory panel members furnish the co-editors wiih critical comments on the 1975 Principles. On the basis of these comments, a first draft was prepared and circulated to the advisory panel members. The responses were many and varied. Analysis of these indicated that major rewriting was necessary, not merely an update of the 1975 Principles. The target dale had to be extended and the Executive Committee of Divisions 14 at its September 1979 meeting instructed the incoming president, Mary V. Tenopyr, to press forward. The Executive Committee also expressed a de4ire that every member of Division 14 have an opportunity to express her/his Opinion before publication. In December 1979, a draft was mailed@to every Division 14 member, using mailing labels purchased from APA. In addition, addresses of new Division members not yet on APA rolls were secuied, and they also received copies. In addition to a copy of the draft, each member received a questionnaire which asked for a rating of each section of p,the draft for agreement and clarity. A discussion of the analysis afforded the replies, and the results of the question- naire were published in the May 1980 issue of The Industrial-Organizational Psychologist. In April 1980 what was perceived as a final draft was mailed to all members of the advisory panel and to all members bf the Executive Committee. With minor editorial revisions, this draft was presented to the Executive Committee at its May meeting. Publication was approvied unanimously. This document is. therefore, an official document of the Division of Industrial-Organizational Psychology. The Division is deeply indebted to the co-editors, the members of the advisory panel, and the membership at large for their constructive suggestions. C. Paul Sparks, President 1978-79 'Mary L. Tenopyr, President 1979-81) Approved For Release 2001/11/07 : C)lA-RDP00-01458R000100110001-9 Approved For Release 2001/11/07 : CIA-RDP00-01458R000100110001-9 Executive Committee, Division 14 Lewis E. Albright, Ph.D. Kaiser Aluminum & Chemical Corporation Milton R. Blood, Ph.D. Georgia Institute of Technology Richard J. Campbell, Ph.D. American Telephone and Telegraph Company Milton D. Hakel, Ph.D. The Ohio State University Virginia E. Schein, Ph.D. The Wharton School, University of Pennsylvania Frank L. Schmidt, Ph.D. Personnel Research & Development Center U.S. Office of Personnel Management Benjamin Schneider, Ph.D. Michigan State University C. Paul Sparks Exxon Company, U.S.A. Mary L. Tenopyr, Ph.D. American Telephone and Telegraph Company Paul W. Thayer, Ph.D. North Carolina State University Victor H. Vroom, Ph.D. Yale University Kenneth N. Wexley, Ph.D. University of Akron Approved For Release 2001/11/07iii CIA-RDP00-01458R000100110001-9 Approved For Release 2001/11/07 : CIA-RDP00-01458R000100110001-9 Advisory Panel on Validation and Use of Personnel Selection Prodedures, Division 14 William A. Owens, Jr. Ph.D. (Co-chair) University of Georgia Mary L. Tenopyr, Ph.D. (Co-chair) American Telephone and. Telegraph Company Elwin A. Fleishman, Ph.D. Aklvanced Research Resources Organization Dlmald L. Grant. Ph.D. Ufiversity of Georgia Lewis E. Albright, Ph.D. Kaiser Aluminum & Chemical Corporation Philip Ash. Ph.D. University of Illinois. Chicago Circle Richard S. Barrett, Ph.D. Consultant C. J. Bartlett, Ph.D. University of Maryland Brent N. Baxter, Ph.D. American Institutes for Research Virginia R. Boehm, Ph.D. Sohio William C. Burns Pacific Gas & Electric Company Joel T. Campbell, Ph.D. Educational Testing Service Jerome E. Doppelt, Ph.D. The Psychological Association Marvin D. Dunnette. Ph.D. University of Minnesota Frank W. Erwin Richardson, Bellows, Henry & Co., Inc. Robert M. Guion, Ph.D. Bawling Green State University James J. Kirkpatrick, Ph.D. California State University, Long Beach Hobart Osburn. Ph.D. University of Houston Charles A. Pounian, Ph.D. City of Chicago Etch P. Prien, Ph.D. Mkbmphis State University Frank L. Schmidt, Ph.D. Pdrsonnel Research & Development Venter UJS. Office of Personnel Management Paul W. Thayer, Ph.D. Ncbrth Carolina State University George C. Thornton, III, Ph.D. Colorado State University Hirold J. Tragash, Ph.D. Xt rox Corporation Kinneth N. Wexley, Ph.D. Utiversity of Akron Shk;ldon Zedeck, Ph.D. University of California, Berkel[ey Approved For Release 2001/11/07 : 9tiA-RDP00-01458R000100110001-9 Approved For Release 2001/11/07 : CIA-RDP00-01458R000100110001-9 Principles for the Validation and Use of Personnel Selection Procedures Statement of Purpose This statement of principles has been adopted by the Executive Committee of the Division of Industrial-Organizational Psychology (Division 14) of the American Psychological Association as the official statement of the Division concerning procedures for validation research and personnel selection. Its purpose is to specify principles of good practice in the choice, development, and evaluation of personnel selection procedures. Such selection procedures include, but are not limited to, standardized paper- and-pencil tests, performance tests, work samples, personality inventories, interest inventories, projective techniques, lie detector or stress analyzer techniques, assessment center evaluations, biographical data forms or scored application blanks, scored or rated interviews, educational requirements, experience require- ments, reference checks, physical requirements such as height or weight or physical ability testing devices, appraisals of job performance, estimates of ad- vancement potential, or any other selection standard, whenever any one or a combination of these is used or assists in making a personnel decision. When any selection procedure is used, the essential principle is that evidence be accumulated to show a relationship between decisions based on assessments made by that procedure and criteria such as job performance, training per- formance, advancement, or other pertinent job behavior. This document is a revision of the Principles published in 1975 by that year's Division 14 Executive Committee. The revision was stimulated by ever-increasing attention to selection practices of employers. This attention has been made manifest by significant researches and theoretical formulations of measurement psychologists, by more detailed guidelines from equal employment opportunity enforcement agencies, and by numerous and diverse interpretations of the federal courts with respect to the extent to which the operational use of selection pro- cedures comports with regulatory requirements and/or professional standards. This statement intends to provide: (1) principles upon which the conduct of personnel research may be based, (2) guidance for practitioners conducting validation studies, (3) principles for application and use of valid selection procedures, and (4) information which may be helpful to personnel managers and others responsible for authorizing or implementing validation efforts. The interests of some people will not be addressed by this statement. These Principles are not intended. to: (1) be a technical translation of existing or anticipated regulation, (2) substitute for adequate training in validation procedures, (3) be exhaustive (although they cover the major aspects of validation), or (4) freeze the field to prescribed practices and so limit creative endeavors. The last point deserves emphasis. Traditional technology calls for a showing that (a) assessments made by a particular method (or combination of methods) are useful for predicting behavior in some aspect of employment, and (b) that the predictions can be made within an acceptable allowance for error (usually ex- pressed in terms of coefficients of correlation or percentage of misclassifications). The use here of "predicting" and "predictions" implies no preference for a criterion- related predictive strategy. All measures made by a selection procedure are secured with express or implied expectation that they will be related to one or more important aspects of job behavior. The principles presented here are generally stated in the context of traditional approaches. Other developments in validation research are addressed as appropri- Approved For Release 2001/11/07 t CIA-RDP00-01458R000100110001-9 Approved For Release 2001/11/07 : CIA-RDP00-01458R000100110001-9 ate but are not systematically developed here; e.g., the use of what has been described as formal decision theory (CronbLch & Gleser, 1965; Dunnette, 1974), the various forms of synthetic validity (GuiorL. 1965; McCormick & Meeham, 1970; Primoff, 1972), Bavesian inference (Novicld & Jackson, 1974; Schmidt, Hunter, Pearlman, & Shane. 1979), or internal/external validity (Cook & Campbell, 1976, 1979; Cronbach, 1980). The traditional approaches are used as a framework because their concepts have been established through a long history and are explicated in most current text books. It is k-ometimes difficult to define "a long history" Two well-known and respected professionals may disagree vehemently as to whether a given position has been thoroughly established or is still in the developmental stage. The Principles are not meant to be at j'ariance with the Standards for Edu- cational and Psychological Tests (APA, 1074). However. the Standards were written for measurement problems in general while the Principles are addressed to the specific problems of decision making! in the areas of employee selection, placement, promotion, etc. In addition, a !Joint Committee of the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education has completed a review of the 1974 Standards and has recommended tl1at they be revised, generally for the same reasons that this Principles revision waslundertaken (AERA, APA, & NCME, 1979). Further, the Committee recommendh, "The new Standards should he a statement of technical requirements for sound professional practice and not a social action prescription." This Principles !revision is consistent with that ex- pression. Like the Standards. the Principles statekl here present ideals toward which the members of this Division and other reseal .hers and practitioners are expected to strive. Circumstances in any individual 4tudy or application will affect the importance of any given principle. Researcheks and practitioners should, however, consider very carefully any factors suggesting that a general principle is inap- plicable or that its implementation is not feasible. It is most appropriate to bear in mind the following statement from the i tandards, cited in full in the 1975 Principles and now repeated here: A final caveat is necessary in view of t1 a prominence of testing issues in litigation. This document is prepared as a technical guide for those within the sponsoring professions; it is hot written as law. What is in- tended is a set of standards to be used, inl part, for self-evaluation by test developers and test users. An evaluatior>A of their competence does not rest on the literal satisfaction of every relek'ant provision of this document. The individual standards are statements t>f ideals or goals, some having priority over others. Instead, an evaluation of competence depends on the degree to which the intent of this document has been satisfied by the test developer or user (APA. 1974, p. 8).I The Principles are intended to represent the consensus of professional knowledge and thought as it exists today, albeit not a consensus omnium since this is probably unattainable. Also, it is to he noted that personnel selection research and development is still an evolvingl field and techniques and decision- making models are subject to change. This document contains references for further reading and for support of the principles enunciated. It is expected that both researchers and practitioners will maintailn an appropriate level of awareness of developments in the field. I Definition of V$Iidity Validity is the degree to which inferences from scores on tests or assessments are justified or supported by evidence. It should be noted that validity refers to the inferences made from the use of a procedtre, not to the procedure itself. The Approved For Release 2001/11/07 : pIA-RDP00-01458R000100110001-9 Approved For Release 2001/11/07 : CIA-RDP00-01458R000100110001-9 primary question to be answered in validation is the degree to which these infer- ences are appropriate. Use of a specific procedure may lead to valid inferences in one area and yet fail to lead to valid inferences in another area. It is incumbent on the investigator to define, in advance of any validation effort, the inferences to be made and to plan the validation strategy accordingly. In planning validation it is not appropriate to think of validity as a single number or other result of a set of procedures. Several authors (e.g., Dunnette & Borman, 1979) have criticized the rigidity with which validation procedures have been applied, with apparently little thought of the meaning to be imparted to the results of the tests or other assessment procedures. A particular problem is the compartmentalization of validity into the categories of criterion-related, content, and construct. The three are really inseparable aspects of validity, not discrete types of validity. Although the three may represent differences in strategy, they do not necessarily indicate differences in concept. For example, aptitude tests are typically associated with criterion-related validation. In their development, items or components are frequently chosen on the basis of content sampling. Construct considerations are usually a major factor in defending the domain from which the items or components are sampled. Also, as mentioned earlier, prediction is often thought of as closely associated with criterion-related validation. In employment situations the use of scores from a procedure developed on the basis of content also has a predictive basis. That is, one measures performance in a domain of job activities which will be performed later. Furthermore, constructs may be said to underlie all predictions and so render score interpretations meaningful. The Principles discuss these three validity strategies separately only to take advantage of traditional presentations. However, the reader is advised that in concept, and many times in methodology, the three cannot be logically separated. The Principles also use the term "strategy" instead of "validity" in labeling the three aspects. The purpose of this usage is to emphasize again the interrelatedness of the three aspects. The Principles also contain discussions of the generality vs. specificity issue in validation. The need to develop selection procedures with generality is emphasized, not only for practical considerations, but also to further the search for establishment of meaning relative to selection measures. A Comment on "Fairness" Social and legal influences have led to a concern, shared by psychologists, for fairness or equality in employment opportunity. A basic assumption of the prin- ciples of good practice is that those who follow them will also further the principle of fair employment. The interests of employers, applicants, and the public at large are best served when selection is made by the most valid means available. These Principles are technical in focus. They are primarily concerned with validity. The maximization of opportunities for each individual can be most effective where validity enables one to attain the highest level of accuracy in prediction or assess- ment of qualifications. Fairness of a selection procedure, when criterion-related methodology is used, has been subject to many definitions. There are two basic classes of definition, the psychometric and the decision-making. The psychometric models advanced are numerous (Cleary, 1968; Cole, 1973; Darlington, 1971; Einhorn & Bass, 1971; Guion, 1966; Linn, 1973; Thorndike, 1971). Results yielded by these models are often not consistent with each other and may even be contradictory. The Principles do not at this stage of the professional debate advocate any one model. However, the reader is directed to Petersen and Novick (1976) who have pointed out that all of the models except those of Cleary and Einhorn and Bass have problems relative to their internal consistency. The model proposed by Guion is also not internally faulty. Most of the recent work has been devoted to decision-theoretic models (Cronbach, 1976; Petersen & Novick, 1976; Schwartz, 1978). These models require Approved For Release 2001/11/07 3 CIA-RDPOO-01458R000100110001-9 Approved For Release 2001/11/07 : CIA-RDP00-01458R000100110001-9 an advance specification of utilities, thereby essentially removing the question of fairness from the hands of the psychometricians. Application of Principles It is not likely that anyone will completely satisfy the ideal of every applicable principle. This probability raises the questi~m of relative levels of stringency in adhering to the individual principles. Thel importance of a principle depends primarily on the consequences of failure to satisfy it. In selection research, where failure to adhere to a given principle would create a serious possibility of an erroneous decision about the validity or job4efated ness of a selection procedure, it is particularly important to adhere to prop~,r procedures. In the operational use of validated selection procedures, the importance of adherence to the Principles again depends on the consequences of error. Will selection errors result in physical, psychological, or economic injury to people? WVill the safety or operating efficiency of the organization be impaired because of election errors? If so, then the prin- ciples may need to he followed more rigorokusly than in less crucial situations. Three axioms underlie the application of alllthese principles: (1) Individuals differ in many ways. (2) Individual differences in personal dharacteristics and backgrounds are often related to individual differences in behavior on the job. 13) It is in the best interest of organizati,Ians and employees that information about relevant differences between i individuals be developed and used in assigning people to jobs. Objectives of Validdtion Efforts Before any assessment procedure is coisidered, or any validation effort is planned, one should have a clear idea of the 3bjective of the assessment or valida- tion. Any such statement of purpose logical) must come from an understanding of the needs of the organization and of its pr* sent and prospective employees. As a general matter, a researcher should devel ) clear objectives for the proposed assessment procedure(s) and design the validation effort to determine how well they have been achieved. Objectives should b consistent with professional, ethical, and legal responsibilities. Ideally, all aspects of the decision-making process should make a valid contri- bution to achievement of those objectives. }researchers should present evidence for the validity of as many aspects of the decision-making process as feasible. All assessment methods used should make a contribution to validity in ways which can be demonstrated. However, when it is impossible or infeasible to apply validation methods to a given part of the decision-makitjg process, that part should have a relationship, discernible by a knowledgeable person, to appropriate purposes of the organization. Job Analysts A systematic examination of the job and he context in which it is performed will provide an enhanced understanding of t1 le selection problem. This will also enhance the likelihood of finding a significant Velationship between predictors and criteria in a criterion-related study through development of hypotheses concerning predictors and development or evaluation of (criteria. Job analysis is essential to the development of a content oriented procedure. A number of job analysis procedures exist, each differing in terms of Its possible contribution to the ob- jectives of the particular study or a portion) of the study (McCormick, 1979). There is currently no authoritative set of principles for job analysis comparable to the Standards or Principles in the area of selection procedures. The development of such a set is beyond the scope of this docujnent. Discussed below are some of the elements of current practice and some of this constraints which they impose. All formal job analysis techniques specify f he descriptors, or units of analysis. Approved For Release 2001/11/07 : ILIA-RDPOO-01458R000100110001-9 Approved For Release 2001/11/07 : CIA-RDP00-01458R000100110001-9 by which the job(s) will be defined. One way of classifying such techniques is by the nature of the descriptor specified and the type of job definition produced. For example, task analysis specifies the use of task or activity statement descriptors which culminate in a definition of the job-oriented content of the job(s); work behavior analysis specifies the use of behavior statement descriptors which culminate in a definition of the worker-oriented content of the job(s). Another way of classifying job analysis techniques is through the systems and methods used. Some systems provide a standardized set of job descriptors, usually an inventory or a questionnaire, which is programmed to provide output along a prescribed set of dimensions (Baehr, 1971; McCormick, Jeanneret, & Mecham, 1972; Pass & Cunningham, 1977). Other systems or methods require origination and develop- ment of the job descriptors by the analyst but with the analysis programmed to provide results according to a prescribed matrix of dimensions (Christal & Weissmuller, 1976; Fine & Wiley, 1971; Primoff, 1971). A summary of job analysis results to that date has been published by Prien and Ronan (1971). The objective of the research is to obtain job information appropriate to the purpose or application of that job analysis information. The choice of job analysis methodology (e.g., the descriptors chosen and the job analysis operations used) is determined by that objective but with situational constraints. Constraints which need to be considered in the choice of method include, among others, the nature of the jobs, the situation, the resources available to the researcher, the research design and the types of evaluative operations which are included in the research design. For example, the extent to which the researcher's objectives include assessing similarities among jobs or the formation of job families may be an important element in the choice of technique (Cornelius, Carron, & Collins, 1979). Pearlman (1980) reviews the literature and examines the conceptual and research issues in this area. Criterion-Related Strategy In general, the use of any personnel selection procedure is to predict future performance as measured by some job relevant criterion. Evidence for criterion- related validity typically consists of a demonstration of a statistically significant relationship between the selection procedure (predictor or predictors) and one or more measures of job relevant performance (criterion or criteria). It is, therefore, vital that the choice of both predictors and criteria be evaluated with great care. In this section the word "predictor"will be used to refer to any aid to decision- making used in the context of personnel selection (in or out), placement, classifi- cation, or promotion. It will include, but not be limited to, standardized ability tests, personality inventories, biographical data forms, situational tests, assessment center evaluations, interview-based ratings, performance ratings, evaluations of training or experience, etc. (See Statement of Purpose, p. 1.) Predictors which are objective or "standardized" are preferred; i.e., where standard directions and procedures for administration, scoring, and interpretation are both delineated and employed. The principles of this section apply to all predictors, but more easily to those more rigorously standardized. A. Determination of Feasibility. Anyone contemplating a criterion-related validity study must first determine whether such a study is feasible. It is not always possible to conduct a well-designed or even a reasonably competent study; and although it may be argued that most errors merely reduce estimated validity, a poor study is not better than none. Several considerations are relevant in determin- ing feasibility. First, one must be able to assume that the job is reasonably stable and not in some period of rapid evolution. Although validity coefficients seem to be quite robust across both tasks and situations (Schmidt, Hunter, & Pearlman, in press), the traditional logic of validation research is that it is undertaken under conditions as comparable as possible to those which will exist when the results are -made Approved For Release 2001/11/07 5 CIA-RDPOO-01458R000100110001-9 Approved For Release 2001/11/07 : CIA-RDP00-01458R000100110001-9 operational. I f this assumption is obviously aihd grossly in violation, it is incumbent on the researcher either to modify the validation strategy appropriately or to postpone the study until reasonable stabilitylhas returned. Second, it must be possible to obtain or develop a relevant, reasonably reliable and uncontaminated criterion mea ure(s)- Of these characteristics, the most important is relevance. This means tha4 the criterion must accurately reflect the relative standing of employees with res ect to prescribed job behaviors. If such a criterion measure does not exist ore nnot he developed, criterion-related validation is not feasible. Criterion-related tudies based upon criterion availa- bility alone, rather than upon relevance, arelinappropriate. Third, a competent criterion-related valdation should be based to the extent feasible on a sample which is reasonably Ippresentative of the populations of people and jobs to which the results are to he generalized. As mentioned previously, validities appear to he quite stable across bo h tasks and situations but there are influences, such as restriction of range in t e predictor, the criterion, or both, which may obviously distort an estimate ohta ned from a particular sample. When there is evidence that gross distortion has .-curred, the researcher must either estimate its impact, and adjust for it, or must conclude that it is not feasible to conduct a criterion-related validation. Fourth, to conduct a criterion-related validity study which potentially lacks adequate "statistical power" may leave the is ue of validity unresolved. The term power refers to the probability of obtaining li statistically significant relationship between predictor and criterion in a sample rf such a relationship does, in fact, exist. Factors determining statistical power include sample size, degree of predictor range restriction, criterion reliability, and he size of the predictor-criterion relationship (Cohen, 1977). Combinations of hese variables leading to low power can occur frequently in practice (Schmidt. Hurter, & Urry. 1976). As a consequence, it is quite possible to conclude that a significant predictor-criterion relationship is lacking when one does, in fact, exist. If th requirements cannot be met, the situation may not lend itself to a criterion-relied validation. Fifth, the previous discussion has impli the use of correlational statistics developed from predictor-criterion relationshi s. A special case must be made for those situations in which some intervening v riahle has essentially eliminated all variance from the criterion. An example of t is is a self-paced training situation in which all selectees have attained a mastgry level of the knowledge or skill being taught. If the training content is truly job related, no significant correlation can be obtained between training success and ob success since there will be little or no variance in either the training success ptdictor or the job success criterion. There will be no significant correlation with a predictor selected to predict success in training since there will be little or do variance in the success in training criterion. Training time may he the only feasible criterion with an acceptable amount of variance present. Use of experimental and control groups with the experimental group selected on the predictor) and the control group selected by some method which assures randomness may rovide evidence of validity in such a situation (Goldstein. 1980). t B. Design and Conduct of Validity Studios. If it has been determined that a criterion-related study is feasible, attention nay then be directed to the design and conduct of such a study. There are two crt Trion-related designs for generating evidence as to the validity of a measuring device. One design employs the predictive model in which predictor information is obtained prior to placement of employees on la job and criterion information is obtained later. This design answers the most c6mmon employment question; i.e., does the predictor indeed have forecasting valua with respect to later job behavior? As such, the predictive model addresses itself to the basic selection issue as it normally occurs in the employment context. I, Approved For Release 2001/11/07 :t CIA-RDPOO-01458R000100110001-9 Approved For Release 2001/11/07 : CIA-RDP00-01458R000100110001-9 The other design is the concurrent model in which both predictor and criterion information are obtained for present employees at approximately the same time. The research literature clearly indicates that well conducted concurrent studies can provide useful estimates of predictive validity (Bemis, 1968; Pearlman, Schmidt, & Hunter, in press). Both types of criterion-related studies are susceptible to the effects of range restriction. However, the test scores obtained in concurrent studies may also be influenced by additional job knowledge, different motivation, or added maturity of incumbents vs. applicants. A concurrent study with appropriate con- trols should yield results very comparable to those of a predictive study. 1. Criterion Development. Once a validation model has been selected, the researcher should next be concerned with obtaining any necessary job information. In general, if criteria are chosen to represent job relevant activities or behaviors, the results of a formal job analysis will be helpful in criterion construction. Although numerous procedures are available (see p. 4), there does not appear to be a clear choice of method. What is essential, however, is that information about the job be competently and systematically developed. If the goal of a given study is the exclusive prediction of such nonperformance criteria as tenure or absentee- ism, a formal job analysis will not usually be necessary, though an understanding of the job and its context will still be beneficial. Some considerations in criterion development follow. a. Criteria Should be Related to the Purposes of the Investigation. Criteria should be chosen on the basis of relevance, freedom from contamination, and reliability rather than on the basis of availability. This implies that the purposes of the research are (1) clearly stated, (2) acceptable in the social and legal context in which the organization functions, and (3) appropriate to the organization's needs and purposes. If adequate measures of important components of job per- formance are not attainable, it is not acceptable practice to substitute measures which are unrelated to the purposes of the study. One may not achieve the appear- ance of broad coverage by substituting irrelevant criteria which are available for relevant criteria which are unavailable. b. All Criteria Should Represent Important Work Behaviors or Work Outputs, on the Job or in Job-Relevant Training, As Indicated By An Appropriate Review of Information About the Job. Criteria need not be all-inclusive, but there should be clear documentation of the reasoning determining what is and what is not included in a criterion. Criteria need not be measures of actual job performance. In many cases, in fact, actual job performance measures may not possess the desirable characteristics specified above for criteria. Depending upon the job being studied and the purposes of the researcher, various criteria such as overall proficiency measured with a standard work sample, success in job relevant training, sales records, number of prospects called, turnover, or rate of advancement may be more appropriate (Wallace, 1965). c. The Possibility of Bias or Other Contamination Should be Considered. Although a simple group difference on the criterion does not establish bias, such bias would result if a definable subgroup were rated consistently and spuriously high (or low) as compared to other groups. Conversely, if a group difference did, in fact, exist but were not revealed by appropriate ratings, this would also constitute bias. It is therefore apparent that the presence or absence of bias cannot be detected from a knowledge of criterion scores alone. If objective and subjective criteria disagree, bias in the more subjective measure may be suspected, although bias is not limited to subjective measures. There is no clear path to truth in these matters. A criterion difference between older and younger employees, or day and night shifts may reflect bias in raters, equipment, or conditions, or it may also reflect genuine differences in performance. What is required is the anticipation and reduction of the possibility of bias, alertness to this possibility, protection against it insofar as is feasible, and use of the best judgment possible in evaluating Approved For Release 2001/11/07 CIA-RDP00-01458R000100110001-9 Approved For Release 2001/11/07 : CIA-RDP00-01458R000100110001-9 the data, Contamination, per se, could exist ifl selection test results were available to supervisors making presumably independejnt performance ratings. Correction after the fact is a near impossibility in this case. d. it Evidence Recommends that Several Criteria be Combined to Obtain a Single Variate, There Should be it Rationale t* Support the Rules of Combination.. For example, it is probably generally preferable to weight for relevance, although special circumstances may occasionally argu* otherwise. Thus, if well informed judges are unavailable. it may he best to assign unit or equal weights to the several criterion components. e. It is Desirable, But Not Essential, That Criterion Measures be Highly Reliable. Reliability should be estimated, wlhere feasible, and by appropriate methods (e.g., Stanley, 1971). It must he recognized that criterion reliability places a ceiling on observed validity coefficients. Thus, the effect of criterion unreliability is to cause an underestimation of true validity. 2. Choice of Predictor. There are numeh-ous factors other than availability which should influence choice of the predictdr(s). Several of these follow. a. Predictor Variables Should be Chosen for Which There is an Empirical, Logical, or Theoretical Foundation. This principle does not call for elegance in the reasoning underlying the choice of predictorslso much as it does for having some reasoning. A study is more likely to indicate Validity if there is a good reason to suppose that a relationship exists between a predictor chosen and the behavior it is supposed to predict. For example, the research literature or the logic of develop- rnent may provide the reason. This principle dues not intend to rule out application of serendipitous findings, although such findings usually need verification. b. Preliminary Choices Among Predictor.( Should Be Based on the Research- er's Scientific Knowledge Without Regard foi- Personal Bias and Prejudice. The researcher's choice of trial predictors shouldi yield to the findings of relevant research and resist the influence of personal interest, mere familiarity or expe- diency. On the other hand. the researcher must exercise some critical judgment to achieve the parsimony in a predictor batt4ry necessary to minimize predictor redundancy or the capitalization on chance whhch may occur with small samples. e. Other Things Equal, Predictors Whi~h Are More Objective Are to be Preferred. Thus, the assessment of a candid4te should be maximally dependent on his/her personal characteristics and minibially dependent on who made the assessment. Similarly, where non-test predict,k)rs like interviewer judgments are utilized, an effort should be made to develop procedures which will minimize such sources of error variance as are represented by differences between judges. d. Outcomes of Decision Strategies Shotyld be Recognized as Predictors. It must be noted that the decision-maker who interprets and acts upon a complex of predictor data interjects something of himl?elf/herself into the interpretive or decision-making process. These judgments orjehese decisions thus become at the least an additional predictor, or at the most the only predictor. So, for example, if the decision strategy is to combine test an?t non-test data (reference checks, medical data, etc.) into a subjective judgment, the actual predictor is the judgment reached by the person who weights and sumniarizes all the information. 3. Choice of Sample. The meaningfulne-.s of the research result is greatly dependent on the sample. Having several hundred subjects may not be better than one hundred if the selection of subjects chosen to obtain the larger N does not have an appropriate rationale. I a. The Sample for a Validation Study Shbruld be Carelidly Chosen. Whether the study is predictive or concurrent, the incumbent sample is unlikely to be representative of the applicant group on all variables. Whether such characteristics as age, race, or sex affect predictor-criterion r$lationships is an empirical question, and the researcher should therefore rely oni the research literature in makine professional judgments about their possible relevance. Because many character- Approved For Release 2001/11/07 : CIA-RDP00-01458R000100110001-9 Approved For Release 2001/11/07 : CIA-RDP00-01458R000100110001-9 istics studied to date appear to have little or no effect on predictor-criterion relationships, no variable should be assumed to moderate validities in the absence of explicit evidence for such an effect. For example, the research literature shows that validities within races (black vs. white) are usually comparable on cognitive selection tests (Linn, 1978). b. The Sample Upon Which the Research is Based Should be Large Enough to Provide Adequate Statistical Power. A study which has only a low probability of detecting the true validity of the predictor provides little information. Statistical power may be increased to acceptable levels in a number of ways, the most obvious of which is to increase sample size by the addition of appropriate persons. c. An Extremely Large Sample or Replication is Required to Give Full Credence to Unusual Findings. Such findings include, but are not limited to, suppressor or moderator effects, nonlinear regression, benefits of configural scoring, or other potentially chance outcomes. Post hoc hypotheses in multivariate studies, and differential weightings of highly correlated variables are particularly suspect. d. When Combining Data from Separate Samples, Both Jobs and Workers Should be Comparable on Variables Which Research has Shown to Affect Validity. If comparability exists on these variables, pooled samples may be expected to provide increased statistical power. 4. Procedural Considerations. The researcher must consider the probable use of any end products. This should be done in advance of the collection and analysis of data. a. Validation Research Should Ordinarily be Directed to Entry Jobs, Imme- diate Promotions, or Jobs Likely to be Attained. Where a selection procedure is designed for a higher level job than that for which candidates are initially selected, that job may be considered an appropriate target job if the majority of the indi- viduals who remain employed and available for advancement progress to the higher level within a reasonable period of time. Where a majority are not advanced to the higher level job, it may still be acceptable to evaluate for such job(s) if the validity study is conducted using criteria that reflect performance at the higher level along with criteria for adequate performance at the entry level. Predictability may diminish over long time spans as a result of changes in abilities and skills required, changes in the job itself, increased restriction of range in the subject pool, and related factors. On the other hand, predictability may increase as the demands of the higher level job result in greater differentiation of the performance of job incumbents or rate of advancement results in varying demands on the indi- vidual. Here again, the purposes of the study are paramount. b. Where Traditional Criterion-Related Validation Strategy is Not Feasible, the Researcher Should Consider Any Alternative Research Methodology Which Offers a Sound Rationale. Examples include synthetic validation, cooperative research on an industry-wide basis, consortia of small users, or gathering data for validity generalization. However, the researcher should be aware that most non- traditional approaches require considerable research and development effort. c. Procedures for Test Administration and Scoring in Validation Research Should be Clearly Set Forth and Should be Consistent with the Standardization Planned for Operational Use. Any specified operational characteristics (such as time limits, oral instructions, practice problems, answer sheets, and scoring formulas) should be clearly set forth and followed in validation research. Failure to do this essentially prohibits generalizations from the research to the operational context. The point of this principle is that for research to enhance the general body of knowledge, the critical research procedures must be consistent with those which are to be utilized in practice. d. It is Desirable That There be at Least Presumptive Evidence for the Validity of a Predictor Prior to its Operational Use. If possible, predictors should Approved For Release 2001/11/07 :9CIA-RDP00-01458R000100110001-9 Approved For Release 2001/11/07 : CIA-RDP00-01458R000100110001-9 his validated prior to operational use. Some researchers find this principle difficult to follow because of the employer's need tb get on with the business of making employment decisions. Where there is exterrhal evidence which supports the prob- alpility of valid prediction, it may be feasible' to utilize the predictors immediately. However, the researcher must avoid situations that make it impossible or difficult to detect validity. For example, decisions should not he so highly selective that severe restriction of range results. If there id no firm basis for the presumption of validity, the researcher must carefully judge whether the dangers of postponing the use of the predictor are greater or less than the dangers of using it prematurely. e. The Collection of Predictor Data land Criterion Measures Should be Operationally Independent. A common example of non-independence is the collection of criterion ratings from supervisors who know selection test scores. If a significant validity coefficient is obtaitbed. it may be due either to a true relationship or to the manipulation of ratirips (consciously or unconsciously) to conform with scores. Such ambiguity should be avoided. 5. Data Analvsis. Modern computer tiechnology allows the researcher to investigate different predictor-criterion relationships, different statistical tech- niques, etc.. with considerable freedom and little cost. Any result based upon an extensive post hoc analysis should be replicated. a. The Method of Analysis Should beChosen with Due Consideration for the Characteristics of the Data and the Assurfiptions Involved in the Development ol the Data Analysis Method. Some violations of assumptions can be tolerated with few ill effects: violations of others mast produce grossly misleading results. It is the responsibility of the investigator to Idnow the assumptions of the methods chosen and the consequences of violations di' them. b. The Tvpe of Statistical Analysis to lbe Used Should he Considered in Planning Me Research. The kinds of decisions to be made, and the way in which predictor variables are to he used in deteirmining these decisions, should be considered in selecting the method(s) of analysis to be employed. Although any standard method(s) may be used, any new of unusual method should be clearly explained in the research report. (It is unddrstood that conditions may develop in the course of an investigation which will rdquire a change in plans.) c. Data Analysis Should Yield Appropriate Information About the Relation- ship Between Predictor and Criterion Meakures. The analysis should provide information about the magnitude and statistical significance of a relationship. Tradi- 'ionally, a validity coefficient or similar statistic which has a probability of less Than one in twenty of having occurred by chankT may be considered as establishing significant validity. There may be exceptionsi to this rule; professional standards have never insisted on a specific level of significance. However, departures from this convention should be based on reasons Which can he stated in advance (such as power functions, utility, economic necessit,; etc.). The analysis should provide information about the strength of the relationship. This is usually expressed in terms of coefficients of correlation but other' methods (such as the slope of the regression line or the percentage of misclassifications) are acceptable and even preferable in many situations. The analysis shcbuld also give information about the nature of the relationship and how it might Be used in prediction. For example, in comparing groups, the slope of the regression line is generally preferable to the coefficient of correlation. Use of expectancy tables may also he appropriate. Information provided should in any event incl(rde numbers of cases and measures of central tendency and variability for both priedictor and criterion variables. d- The Psychologist Should Attempt M Obtain an Unbiased Estimate of Operational Predictor Validity in the Populathon in Which It Will Be Used. Ob- served validity coefficients are typically not unbiased (Schmidt, Hunter, McKenzie &. Muldrow. 1979). Where range restriction opkrates to bias validity estimates, the appropriate adjustments should be made whehever the information necessary to Approved For Release 2001/11/07 :1 CIA-RDP00-01458R000100110001-9 Approved For Release 2001/11/07 : CIA-RDP00-01458R000100110001-9 do so can be obtained. Adjustments for criterion unreliability should likewise be made whenever an appropriate estimate of criterion reliability can be obtained. Psychologists should give careful attention to ensuring that reliability estimates used are appropriate to this correction in order to avoid under or over-estimating validity. Both unadjusted and adjusted coefficients should be reported. Researchers should be aware that the usual tests of statistical significance are not applicable to coefficients adjusted for restriction of range and/or criterion unreliability. Nevertheless, the adjusted coefficient is generally the best point estimate one can make of the relationship between predictor and criterion. No adjustment of a validity coefficient for unreliability of a predictor should be reported unless one clearly notes that the resultant coefficient is theoretical in nature and not opera- tional. e. Where Predictors are to be Used in Combination, Researchers Should Give Careful Consideration to Choice of the Mode of Combination. Researchers should be aware that nonlinear selection decision rules (e.g., random selection from among those scoring above a cutoff) typically reduce the utility of valid selection procedures. When nonlinear selection rules are recommended, a clear rationale (e.g., in terms of administrative convenience or reduced testing costs) should be provided. Tests with linear relationships with job performance can be combined for actual use in either a linear manner (e.g., by summing scores on different tests) or in a nonlinear manner (e.g., by using multiple cutoffs) but the researcher should be aware of the productivity, administrative, and other implica- tions of each choice. f. Researchers Should Guard Against Overestimates of Validity Resulting from Capitalization on Chance. Especially when initial sample size is small, estimates of the validity of a composite battery developed on the basis of a regres- sion equation should be adjusted using the appropriate shrinkage formula or be cross-validated on a new sample. It should be noted that the assignment of either rational or unit weights to predictors does not result in shrinkage in the usual sense. Where a smaller number of predictors is selected for use based on sample validity coefficients from a larger number included in the study, most shrinkage formulas are inappropriate and the alternative is cross-validation unless sample sizes are large. g. The Results Obtained in Criterion-Related Validity Studies Should Be Interpreted Against the Background of the Relevant Research Literature. Cumu- lative research knowledge plays an important role in any science. In interpreting the results of validity studies, the researcher should take into account the previous relevant research literature as well as the specific study at hand. A history of similar findings in the research literature lends additional credence to the results of individual studies. On the other hand, dissimilar findings should be viewed with caution. h. The Researcher Should Ordinarily Make an Assessment of the Practical Value (or Utility) of the Selection Procedure. There are several approaches to assessing the practical value of selection procedures. In some cases a judgment that a procedure is of significant practical value can be based on the consideration of validity, selection ratio, the number to be selected, and the nature of the job. Expectancy tables can also be useful for this purpose, as can the Taylor-Russell Tables. More sophisticated estimates of the impact of selection tests on the productivity of selectees can typically be obtained by using regression-based equations (Brogden, 1949; Cronbach & Gleser, 1965; Schmidt, Hunter, McKenzie & Muldrow, 1979). Both productivity gains per selectee and total productivity gains due to use of the procedure are relevant in assessing the practical value of selection procedures. i. Data Should be Free from Clerical Error. Keypunching, coding and com- putational work should be checked carefully and thoroughly. Approved For Release 2001/11/071:tCIA-RDP00-01458R000100110001-9 Approved For Release 2001/11/07 : CIA-RDP00-01458R000100110001-9 Content-Oriented Strategies Content-oriented predictor development or choice. if properly conducted, provides evidence that a selection procedure samples iob requirements. The following provides guidance for the development of predictors from which valid inferences can he made. Appropriate development of a selectioh procedure on the basis of content requires developing the procedure to he air appropriate sample of a specified content domain. If a selection procedure is to be used for employment decisions, the relevant content domain is perfoormancj (or the knowledge, skill, or ability necessary for performance) on the job, in ielevant job training, or on specified aspects of either (Lawshe. 1975). A procedure may be a sample of a given domain, but if that domain is not an important part (if the job, the value of the procedure for employment purposes is negligible. Content sampling is properly involved ikt the construction or choice of any selection procedure, whether scores are to bt interpreted as measures of achieve- ment or as measures of work behavior. This discussion is limited, however, to situations in which the assessment is evaluateit solely in terms of content sampling. It should he noted that content sampling is asi useful in the construction and evalu- ation of criterion measures as it is for selectibn procedures used for employment decisions. I In content sampling, any inference ahotbt the usefulness of a score must he preceded by a set of inferences about the instrument itself based on the method of its construction (Messick, 1975). For that rea4on, the emphasis of this section and of its title is on the development of content-oriented assessment instruments rather than on inferences from scores. Any hvaluation of existing selection pro- cetlures in terms of adequacy of content sampling might follow parallel con- siderations. A. The Job Content to he Sampled Should he Defined. That definition should 1 e based on an understanding of the job, organization needs, labor markets, and other considerations leading to personnel kpecifications and relevant to the Organization's purposes. The domain need nk he inclusive insofar as any larger domain is concerned. By this we mean that It does not have to cover the entire universe of topics covered in a training coursje or of duties of a particular job. In fact, there may he many domains in the total! content universe for any given job. For both what it does and does not includ*. a job content domain should he completely defined and thoroughly described[ In defining a content domain, it is ess@ntial that the degree of generality needed in a selection procedure he specified in advance. For example, the extent to which the job is likely to change should bel known. If job changes are likely to be at problem, the researcher may wish to develop a selection procedure which is quite general; e.g., eliminating material like specific sales prices which may change from month to month and concentrating on coihhtent which is less specific. The more it selection procedure has point-in-time fidelity to exact job operations, the less likely it is to have enough generality to remain appropriate in view of job changes. Also, the more a selection procedure is a specific sample of a domain involved in one job, the less likely it is to apply to other si}nilar jobs. Specificity and generality form the ends of a continuum, and no one ekcept the researcher can determine how general a selection procedure should b k-.. The important thing is that the researcher be aware in advance of conditions which may affect the generality decision; and that the generality decision mtj.st have a clear rationale based on the specific selection situation at hand, organizational needs, anticipated changes in technology, equipment, and work assignments, and human and economic considerations. This principle also applies in the development of content-oriented criteria for use in a predictive or concurrent !criterion-related study. The degree Approved For Release 2001/11/07 CIA-RDP00-01458R000100110001-9 Approved For Release 2001/11/07 : CIA-RDP00-01458R000100110001-9 to which the results of the study can be generalized will depend partly on the generality of the criteria and their applicability over time and jobs. B. Special Circumstances Should be Considered in Defining Job Content Domains. Domain definitions need not follow any prescribed format. There are many instances in which domains must be described differently depending on the exact situation. It may even be necessary to assess possible measurement problems in advance of domain definition. Generally, in the case of work samples, the closer a domain is to the totality of the job, the more difficult the procedure is to admin- ister and score. For example, cleaning dirty mechanisms may be part of a mechanic's job, but it may be impossible to develop a test so that every examinee would have the same amount and kind of dirt to remove. In this situation, it would be appropriate to eliminate such cleaning tasks from the test domain. Similarly, seldom used symbols such as the hyphen or question mark appear in different places on different typewriter keyboards; thus, it might be appropriate to limit a typing domain to alpha and numeric characters which are standard on all type- writers. Also, a short course designed to select persons for a longer course should not be based on a domain involving the totality of the longer course, because the advanced lessons require knowledge gained in the beginning lessons. In this situa- tion, the domain should be defined only in terms of lessons which require no prior knowledge. Again, judgment must be used in defining a domain and the rationale involved must be explicitly described. C. Job Content Domains Should be Defined on the Basis of Accurate and Thorough Information About the Job(s). A content domain should ordinarily be defined in terms of tasks, activities or responsibilities or specific abilities, knowl- edge, or job skills found to be prerequisite to effective behavior in the domain. This means conducting a job analysis. This may be a formal investigation, or the pooled judgments of informed persons such as production engineers, job incum- bents, their supervisors, or personnel specialists. (See p. 4.) The term "ability" is difficult to define and distinguish from "skill;" and it is important to note here that the use of the former term does not imply that content validity is a sufficient justification for the use of abilities or for such characteristics as empathy, domi- nance, leadership aptitude, and other broad psychological traits. Justification for the measurement and use of such traits must be based on empirical data rather than content sampling alone. It also follows that many procedures developed from general use in a variety of situations are not appropriate samples of a properly defined domain of job content. In particular, general intelligence tests are not appropriately justified by content sampling. Job requirements assessed by other than formal tests may be established on the basis of content. Requirements for or evaluation of specific prior training, experience, or achievements can be content valid on the basis of the relationship between the content of the training, experience, or achievements and the content of the job for which the training, experience, or achievements are evaluated or required. The critical consideration is the similarity between the products, knowl- edges, skills, or abilities demonstrated in the experience, training or achievements and the products, knowledges, skills, or abilities required on the job, whether or not there is a close resemblance between the experience, training and achievements as a whole and the job as a whole. D. Job Content Domains Should be Defined in Terms of Those Things an Exployee is Expected to Do Without Training or Experience on the Job. It is important to delineate what knowledge, skills, and abilities an employee is expected to have before placement on the job, and define the selection domain in those terms. This definition process often is not simple. There is a fine line between what an employee brings to the job and what he or she is taught on the job. In many instances, those who bring more learning to the job require shorter or different training than others. It is incumbent on the investigator to seek the Approved For Release 2001/11/073: CIA-RDPOO-01458R000100110001-9 Approved For Release 2001/11/07 : CIA-RDP00-01458R000100110001-9 appropriate balance between selection and training and define the content domain for the procedure in accordance with this balance (Goldstein, 1980). The point here is that selection does not occur indepeiidently of training and this fact must he taken into account. The principle stated here does not preclude relegating different levels of the same ability to selection and training. For example, the fact that an employee is taught to read and interpret company technical manuals does nut mean that the job applicant should not bb evaluated for basic reading skills. E. A Job Content Domain May be Restricted to Critical or Frequent Activities or to Prerequisite Knowledge, Skills, or Abilities. There is no virtue in measuring ability to handle trivial aspects of work. On the other hand, a single activity may he so imnortant that it constitutes a singlei domain for measurement purposes. For example. a truck driver must be able to drive a truck. The fact that he or she may perform other functions is irrelevant toideveloping a measure of driving skill or ability. F. Sampling of a Job Content Domain Should Ensure that the Measure Includes the Major Elements of the Defineki Domain. Sampling the job content domain is the process of constructing or ch(bosing the selection procedure. If the domain is defined properly; e.g., excludes thdse things not appropriately measured, learned on the job, or trivial, there should be little difficulty in moving fairly directly from domain elements to selection procedure elements. Any sampling done at this stage should have some rationalk : e.g., the most critical elements are chosen. Random sampling is not usually appropriate in this area. Generally, the acceptability of the selection procedure rests on the extent to which elements of the procedure domain match elements of a fob content domain. G. A Test Developed on the Basis of Content Sampling Should Have Appro- priate Measurement Properties. Linn (1979) bas pointed out that there are contra- dictions between strategies based on domain lconsiderations and those based upon score considerations. A very simple exampld of these problems is the question of what to do with a test item which is either (too easy or too difficult and thereby contributes nothing to the total score variande. Under a score or norm referenced strategy, the item would be eliminated. Using a domain or criterion referenced strategy, the item would be retained. Although there is much opportunity fjr further discussion in this area, it appears that for selection purposes, as opposed to achievement measurement purposes, the investigator should resolve niany of the differences between the strategies in the direction of norm referenced strategies. For purposes of selection, it is appropriate to consider the instrument involved as predictive in nature in the sense that the evaluation is intended to measure the probability of job success. As can be noted from previous sections, if one considers even limited needs for generality, the selection procedure developed will ordinarily be less than a repre- sentative sample of any content domain, alhhough in the development process, every reasonable effort should be made to maintain content domain relevance from the selection procedure. The following sugg$stions are made to provide effective measurement in a predictive instrument. 1. Where feasible, the selection procedure should be subjected to pretesting and an analysis of the procedure in terms of the means, variances, and intercor- reiations of its parts. a. Parts which do not contribute to thei total variance should be considered for elimination. Any replacement parts should reflect the same area of the content domain as those parts which were eliminated. b. When a critical score is specified in hdvance and is not expected to fluc- tuate with labor market conditions or other events, parts which yield maximal discrimination at that score level should be'! selected. However, any selection of parts should take into consideration the sampling of the original content domain; i.e., a test item from one subject matter area khould not normally be replaced with Approved For Release 2001/11/07 : CI4-RDP00-014588000100110001-9 Approved For Release 2001/11/07 : CIA-RDP00-01458R000100110001-9 one from another subject matter area simply on the basis of item statistics. Fur- thermore, any efforts to increase total variance should take into consideration the need to reflect the content domain. c. Questions dealing with intercorrelation of parts should be dealt with judiciously. Extreme redundancy of measurement should be avoided. Redundancy reduction may be achieved to some extent through reduction of job analysis data preliminary to domain definition, or it may be effected through analysis of trial administrations of the selection procedure. Redundancy reduction in content- oriented test construction is somewhat analogous to test selection through multiple regression techniques in criterion-related methodology. However, in reducing redundancy one should consider the need for a certain amount of redundancy to provide adequate reliability of measurement. Well constructed parts which do not correlate with other parts or a total score should not necessarily be eliminated. Many domains relative to job performance are multidimensional. For example, a typist who can hit the correct keys cannot necessarily do the arithmetic necessary to do the set-up of the columns for a numerical table. If the lack of correlation among selection procedure parts is merely reflective of the lack of correlation of parts of the content domain, it is appropriate to include the uncorrelated parts in the selection procedure. 2. Reliability is a matter of concern in all measurement, but it is a particular concern when work samples are involved. Equipment may wear or function variably; scoring variations may occur; a desire to minimize testing time may result in taking a sample too small to ensure reliable results; practice and fatigue effects may also be a problem. The foregoing is not meant to suggest that work samples are inappropriate; obviously, for many situations, they are appropriate measuring devices. However, unreliable work sample scores are not to be pre- ferred over well constructed, reliable paper-and-pencil scores. 3. Scoring schemes for content-oriented tests should be ascertained to be correct. Multiple correct answers should be avoided unless they are clearly justified by information about the job. 4. Interpretation of content-oriented selection procedures may reflect the measurement properties of the given procedure. If a selection instrument yields reliable results, and provides adequate discrimination in the score ranges involved, persons may be ranked upon the basis of its results. However, if an instrument is constructed more in the manner of a training mastery test, in which the examinee is expected to get all or nearly all of the items correct, a critical score may be in order. A critical score is also in order in situations such as those in which the greater speed at which a typist can type cannot be reflected in production because of equipment or process limitations. In this case, the selection procedure should be designed with the limiting conditions considered. H. Persons Used in Any Aspect of the Development or Choice of Selection Procedures to be Defined on the Basis of Content Sampling Should be Clearly Qualified. Panels of experts (i.e., people with thorough knowledge of the job(s) may be used in defining domains, in writing test items, in developing simulation exercises, and in evaluating items or total procedures. The investigator should resist accepting people who are not thoroughly technically qualified. Furthermore, any individuals involved in the procedure construction or choice process should be thoroughly trained in those aspects of measurement necessary for their roles. Generality of Validation Efforts Only that which is generalizable beyond the specific, immediate situation will have much meaning or practical use except to that specific situation. As was pointed out earlier, the degree of generality to be sought must be determined from the total situation. Many questions regarding generality are still open to debate, but they are a matter of concern regardless of the validation strategy used. The Approved For Release 2001/11/071~CIA-RDP00-01458R000100110001-9 Approved For Release 2001/11/07 : CIA-RDP00-01458R000100110001-9 two topics most closely associated with genbrality-construct strategies (which provide the ultimate in meaning and generality) and validity generalization are discussed in this section. A. The Use of Construct Applications ijr Employee Selections. That which has been called "construct validity" in variotbs publications (Equal Employment Opportunity Commission, Civil Service Conlimission, Department of Labor, & Department of Justice, 1978) is really an extdnsion of the traditional concept of construct validity (American Psychological Association, Division of Industrial- Organizational Psychology, 1975). Lerner (1977) has spoken of the traditional. (Cronbach & Meehl, 1955) concepts of construct validation as "an ideal perpetually to he sought, not a workable standard which elm be legally imposed." (p. 302-303). The same might well he said for professionall advice. Consequently, exact prin- ciples for any extension or modification of this concept are difficult to prescribe. Investigators are advised that constructs ate essentially theoretical concepts supported by disconfirmatory research. Thera is need for considerable research to support meaningful interpretations of mativ selection procedure variables. It appears that at present the best support is in tlite area of mental abilities (Ekstrom. 1973). The investigator is obligated to searclh the literature carefully regarding the disconfirmatory research supporting the lconstruct he or she wishes to use in validation. The use of construct definitions without appropriate research support is unacceptable. The investigator is obligated to do his or her own research when the literature does not contain adequath data. Thus, the extension of con- struct validity often involves considerably fiiore effort than other vallidation strategies. However, this effort needs to he undertaken and communicated. It is probably only through the generation of mare theoretical data in the area of personnel selection that many of the pressing problems facing personnel selection specialists today can be solved. More must belknown about the meaning of selec- tion procedure scores (Dunnette & Borman, 1979), so that future research work in this area can go beyond the confines of spedific procedures for specific jobs. Although little guidance is offered herd for an investigator faced with a s,clection situation in which traditional types of validation methodology are inappropriate or infeasible. it should be notedl that there is in-owing concern by a number of researchers about the problems in this area. Cronbach (1980), for example, has proposed strategies less complidated than the traditional construct validation model. Considerable debate will c mainly continue to center around validation strategies and the investigator is advised to keep informed and evaluate carefully the literature in this area. In the melrntime, those evaluating validation efforts should consider the total evidence relative to the evaluative task and not be constrained by previous conceptions of fixed models of validation. 13. Vilidity Generalization. Classic psycht,metric teaching has long held that validity is specific to the research study and that inability to generalize is one of the most serious shortcomings of selection psychology (Guinn, 1976). As has been pointed out previously, current research is shdwing that the differential effects of numerous variables may not be as great as hetetofore assumed. To these findings are being added theoretical formulations, buttiiessed by empirical data, which pro- pose that much of the difference in observed odtcomes of validation research is due to statistical artifacts (Callender & Osburn, in tress; Schmidt. Hunter. Pearlman & Shane, 1979). Continued evidence in this direction should enable further extensions of validity generalization. Cooperative validation efforts being carried on by a number of trade and industry associations Will provide the data necessary for evaluation. Such cooperative efforts are to belapplauded and encouraged. Implementation Validation, discussed in the preceding section, is the investigatory phase in the development or choice of selection procedures. Whatever the outcome of such Approved For Release 2001/11/07 CIA-RDPOO-01458R000100110001-9 t6 Approved For Release 2001/11/07 : CIA-RDP00-01458R000100110001-9 research, the researcher should prepare a report of the findings. The importance of documentation in the form of such a report is especially great if the assessment procedure is to be adopted for operational use. Many valid selection programs fail at the point of their implementation. The following principles are intended to assure effective and proper use of measures found valid. A. Research Reports and Procedures Manuals. Validation research is rarely undertaken for the sake of research. Some general guidance on what to do after research follows. 1. Whenever an assessment procedure is made available for use in employ- ment decisions, one or more documents should be prepared to describe validation research and the standard procedures to be followed in using the results of that research. Reports of validation research should include enough detail to enable a researcher competent in personnel assessment to know what was done, to draw independent conclusions in evaluating the work, and to replicate the study when feasible. This obviously means documentation which covers all essential variables, samples, and treatments. A basic principle in the preparation of such reports is that they should not be misleading. Research findings which might qualify the conclusions or the generalizability of results should be reported. 2. Informational material distributed should be accurate, complete for its purposes, and written in language that is not misleading. Memoranda and manage- ment records should be worded to communicate as clearly and accurately as possible the information that readers need to know to carry out their responsi- bilities competently and faithfully. Care must be taken in preparing such documents to avoid giving others within the organization an impression that an assessment program is more useful than it really is. 3. Research reports and procedures manuals should be reviewed periodically and revised as needed. Any changes in use or in research data that would make any statement in such documents incorrect or misleading should result in revision. 4. Research reports or procedures manuals should help readers make correct interpretations of data and should warn them against common misuses of in- formation. 5. Procedures for administration or other use of a selection procedure should be written by a psychologist or other appropriately trained professional. 6. Any special qualifications required to administer a selection procedure or to interpret the scores or other measurements should be clearly stated in the research report and/or procedures manual. 7. Any claim made for any selection procedure should be supported in docu- mentation with appropriate research evidence. 8. The procedures manual for persons who administer tests (or use other procedures) should specify the procedures to be followed and emphasize the necessity for standardization of administration, scoring and interpretation. These instructions should be clear enough for all persons concerned to know precisely what they are supposed to do. It should be made clear to everyone involved that failure to follow standardized procedures may render the research results irrelevant to some degree. One must be both insistent and persuasive to get people to under- stand both the nature of and the need for standardized administration of tests or the use of other procedures. Periodic seminars run by psychologists or other appropriately trained professionals may be needed to reinforce the written in- structions. Observational checks or other quality control mechanisms should be built into the system. There may be situations where research is based on data from operational studies where nonstandardized procedures may have been used and where the results show no serious impairment of validity. In such situations, the degree of standardization is shown to be relatively unimportant. This should not be assumed without investigation. 9. Any scoring or scaling procedures should be presented in the procedures Approved For Release 2001/11/071, CIA-RDP00-01458R000100110001-9 Approved For Release 2001/11/07 : CIA-RDP00-01458R000100110001-9 manual with as much detail and clarity as ppssible to reduce clerical errors in scoring and to increase the reliability of any judgments required. When keys must be kept confidential, this material should be rhade available only to persons who do the actual scoring or scaling of responses.;. 10. A research report should contain clear and prominent descriptions of the samples used in the research. Such information should also be summarized on any accompanying report forms in which scores are given with normative interpre- tations such as centiles or expectancies of su(tcess. Ordinarily, norm tables are less useful that expectancy charts for employment decisions. One should recognize, of course, that the expectancy chart is a norma- tive interpretation of test scores; i.e., it indicates the proportion of a specific sample of candidates who reach a specified ldvel of success. Norm tables may be useful in identifying the effects of a cutting score, even if not in interpreting individual employment procedure scores. 11. Any normative reporting should include measures of central tendency and variability and should clearly establish the) nature of the normative data given:, i.e., centiles, standard scores, expectancies, prledicted levels of attainment, etc. 12. Any derived scale used for reporting $cores should be carefully described in the research report or procedures manual. Whether using standard derived scores (such as those described in general texjbooks on measurement) or "home- grown" scales (such as "qualified;" "marginal," or "unqualified"), the researcher should make clear their logical and psychometric foundations. 13. Assumptions of validity generalized Irom promotional literature or testi- monial statements may not be used as evidence of the validity of the procedure. Validity evidence should he built on a foundation of systematic procedures like those discussed in this document. i B. flee of Research Results. Applicationlof data in the operational situation must be considered. There are a number of jtdgments to be made here. 1. It is the responsibility of the researcher to recommend specific methods of score interpretation to the user(s). Although the management of the organization usually retains the final decision on whether to use a specific selection procedure, it is the responsibility of the researcher to male recommendations on this question and on questions of how the procedure is to bet used. The recommended use should he consistent with the procedures with which] validity was established. 2. The utility of a selection procedure] should be considered in deciding whether to apply it operationally. In reaching the decision, consideration should he given to relative costs and benefits to both the organization and its employees. It is not recommended that procedures of marginal usefulness be applied, but a ,rocedure with at least some demonstrated qtility is ordinarily preferable to one of unknown validity or usefulness. Under usuktl circumstances, utility has a direct relationship to the coefficient of correlation QBrogden, 1949; Cronbach & Gleser, 1965) and, as mentioned previously, some methods of doing cost-benefit analysis on this basis have been developed (Schmidt, bunter, McKenzie & Muldrow, 1979). 3. Selection standards may be set as high or as low as the purposes of the organization require, if they are based on valid predictors. This implies that at the purposes of selection are clear and (b) they are acceptable in the social and legal context in which the employing organization functions. In usual circum- stances, the relationship between a predictor) and a criterion may be assumed to he linear. Consequently, selecting from the top scorers on down is almost always the most beneficial procedure from the standpoint of an organization if there is an appropriate amount of variance in the predictor. Selection techniques developed by content-oriented procedures and discriminating adequately within the range of interest can be assumed to have a linear relationship to job behavior. Consequently, ranking on the basis of scores on these procedures is appropriate. It is not necessary to add any underlying trait assumptions in order to rank. As has been pointed out, Approved For Release 2001/11/07 : CIA-RDPOO-01458R000100110001-9 Approved For Release 2001/11/07 : CIA-RDP00-01458R000100110001-9 in some circumstances, such as those where a production line limits the speed at which a worker can produce, a fixed critical score may be in order. It is to be pointed out that judgment is necessary in all critical score establish- ment. A fully dependable numerical basis for a critical score is seldom, if ever, available. The only justification which can be demanded is that critical scores are determined on the basis of a reasonable rationale. This may involve such factors as estimated cost-benefit ratio, selection ratio, success ratio, social policies of the organization, or judgments as to required knowledge, skill, or ability on the job. If critical scores are used as a basis for decision (i.e., pass-fail points), the rationale or justification should be made known to users. This principle does not recommend critical scores in preference to other interpretive methods. Rather, the point is that, if critical scores are to be established, there should be some rationale and this rationale should be clearly communicated to users. 4. Employers should provide reasonable opportunities for reconsidering can- didates whenever alternative forms for assessment exist and reconsideration is technically feasible. Under at least some circumstances, employers should allow candidates to reapply. There might be any of several reasons for questioning the validity of prior assessment for any given person. Where there has been oppor- tunity for new learning, retesting or reevaluating is usually a desirable practice. 5. The use of a predictor, particularly a noncognitive predictor, should be accompanied by systematic procedures for developing additional data for con- tinued research. Changing social, economic, technical, or other factors may operate over time to alter or eliminate validity. Periodic research is therefore necessary. A serious problem is that the operational use of a valid predictor may result in such severe restriction of range that its validity cannot be demonstrated in subsequent research (Peterson & Wallace, 1966). There is no well-established technology for checking validity of instruments in use. However, researchers are urged to exercise their ingenuity to observe the principle that validity once demon- strated cannot be assumed to be eternal. 6. All persons within the organization who have responsibilities related to the use of employment tests and related predictors should be qualified through appropriate training to carry out their responsibilities. The psychologist or other person in charge of any selection program should know measurement principles and the limitations on the validities of interpretations of assessments. That person should understand the literature relevant to the selection procedure use or employ- ment problems. Other persons in the organization may have some responsibilities related to the selection program. It is the responsibility of the person in charge to see to it that such persons have the training necessary to carry out those responsi- bilities competently. These considerations suggest the need for planned approaches to the training of technicians and managers involved in assessment procedures and in the interpretation of assessments. 7. Researchers should seek to avoid bias in choosing, administering, and interpreting selection procedures. They should try to avoid even the appearance of discriminatory practice. This is another principle difficult to apply. It goes beyond data analysis. The very appearance of bias may interfere with the effective performance of a candidate in the assessment situation. At the very least, a selec- tion procedure user can create an environment that is responsive to the feelings of all candidates, insuring the dignity of all persons. 8. Researchers should recommend procedures which will insure periodic audit of selection procedure use. Departures from established procedures often occur over time. New findings in psychological or psychometric theory, or new social criticisms, may be relevant to one or more of the assessment procedures in use. The principle is that it should not be left to chance to find examples of misuse or of obsolete data. Some systematic plan for review should be followed. 9. The researcher should recommend procedures which will assure clerical Approved For Release 2001/11/0719 CIA-RDP00-01458R000100110001-9 Approved For Release 2001/11/07 : CIA-RDP00-01458R000100110001-9 accuracy in scoring, checking, coding, or recording selection procedure results. This principle applies to the researcher and t$ any agent to whom he or she has delegated responsibility. The responsibility cannot be abrogated by purchasing services from an outside scoring service. 10. The researcher must make considered recommendations for the opera- tional use of a predictor in any instance in Which the data appear to indicate: differential prediction. A finding of differentiall prediction should not automatically lead to differences in predictor use for different groups. For example, if the study were based upon an extremely large sample, 4 finding of statistically significantly differential prediction may have little practical impact. For another example, data apparently indicating differential prediction rday be due to statistical artifacts or may suggest courses of action inconsistent with societal goals. In such situations. the reasonable course of action would be to recommend uniform operational use of the predictor for the different groups (or perhaps conduct further research). Should a. Finding of differential prediction he compelling enough to warrant tither action, possible approaches to dealing with it are (1) replacing the selection procedures involved, or 12) using the selection procedure operationally, taking into account the differences in prediction results. Action under the second alterna- tive should be in accordance with the definition of fairness upon which the study indicating differential prediction was based. Inlthe absence of a compelling finding of differential prediction, the researcher shodld not recommend differential use of a selection procedure. 1 1. The researcher or other user is respolisible for maintaining security. This means that all reasonable precautions should he taken to safeguard materials and that decision makers should beware of basing decisions on scores obtained from insecure selection procedures. This principle is difficult to apply to non-test predictors such as judgments reached in an employment interview. Nevertheless, the principle of security as it means for standardization and preservation 'of validity may be applied to other variables as well. As an illustration of the eiftension of this principle, reference checks, for example. should be held confidential. Certainly, actual selection procedure scores should he released only to sp~cified persons qualified to interpret them. Every reasonable effort should be made to avoid situations in which injury to the person or damage to the program can result. 12. All implementation procedures should be designed to safeguard the validity of the selection procedures. Any prior information given to candidates about the selection procedures should be uniform for all persons. Particular care should he taken so that some individuals do riot, in the operation of the selection program, have advantages, such as coaching that were not present during the validation effort. Finally, public disclosure of test content should be recognized as a serious threat to the validity, reliability, anti subsequent development of testing procedures. 13. In making interpretations of scores. the researcher should be aware of situational variables which may on rare occaision introduce error. An individual score may lead to invalid inferences because of unusual features of the situation (e.g.. uncommon distractions). exceptional ch*racteristics of the individual (e.g., a sensory or physical handicap) or the passac of time (e.g, demonstrable new learning since evaluation occurred). Sometinbes these may form a basis for re- evaluation. They may suggest the consideration of other information. The prin- ciple is that some degree of judgment be rethined in the interpretation of scores obtained in circumstances differing from thosd in the validation research. Perhaps a better statement of the principle is that judgment should not be automatically ruled out in all situations. 14. Any record of scores should be keptlin terms of raw scores. There have Approved For Release 2001/11/07 : CIA-RDPOO-01458R000100110001-9 20 Approved For Release 2001/11/07 : CIA-RDP00-01458R000100110001-9 been many instances in which data maintained in terms of derived scales have been found inappropriate for further research. 15. Information should not be available for use in personnel decisions when it may no longer be valid. It is recognized that some traits or characteristics are more stable than others but, as a general principle, it is poor practice to retain test scores or other evaluations in personnel files long after the scores were obtained. Personnel files should be purged of data rendered potentially invalid by new experience, aging, maturation, or other personal change-or by changes in jobs or organizations-so that no one will make inferences on such scores. How- ever, appropriate data should be separately retained for future research. 16. When reporting results, the researcher should consider the level of knowledge of the person receiving the report. The report should be in terms likely to be interpreted correctly by persons at that level of knowledge. Ordinarily, scores should not be reported to candidates or to managerial personnel unless they are explained carefully to make certain that interpretations are correct. In particular, one should not report scores to persons who may be asked later to provide criterion ratings for validation. 17. Scores on many tests developed for educational use are given in derived score form as I.Q.s or a grade-equivalent context. Such terms are to be avoided. These terms are highly subject to misinterpretation and not likely to be directly meaningful for employment use. Even where they had legitimate psychometric significance historically, they have been so encrusted with spurious meaning that they lend themselves to misinterpretation. 18. Selection procedures should be administered only to bona fide job candi- dates. Casual administration of selection procedures to supervisors and others who have no real need to take them can result in breaches of security and, at times, cause personal injury. This principle does not preclude administration for research purposes under appropriately controlled conditions. Legislation, Regulation, and Court Decision Opening paragraphs of this document carried a caveat prepared for the 1974 Standards and repeated in full in the 1975 Principles. One sentence of this caveat reads, "This document is prepared as a technical guide for those within the sponsor- ing professions; it is not written as law" (See Statement of Purpose, p. 4.) Neverthe- less, it would be folly for the researcher or practitioner to ignore relevant legislation, subsequent rule-making, and case law in developing strategies for the validation and use of personnel selection procedures (Sparks, 1977). At the federal level (generally the most important since it typically preempts state legislation if there is conflict) the basic historical statute referring to testing is Title VII of the Civil Rights Act of 1964. This is the basic authority for the various guidelines on employee selection procedures issued by the Equal Employ- ment Opportunity Commission and other EEO enforcement agencies (1978). The U.S. Supreme Court noted initially that the interpretations of the enforcement agency were entitled to "great deference" (Griggs v. Duke Power Co., 1971). Later cases (Albemarle v. Moody, 1975; Washington v. Davis, 1976) gave further interpretations involving use of selection procedures. Hundreds of lower court decisions have been rendered based on EEOC guidelines and on interpretations of the U.S. Supreme Court decisions (U.S. Office of Personnel Management, 1979; The Psychological Corporation, 1978). These guidelines and the court decisions sometimes conflict with precepts set forth in these Principles. More recently, however, the Supreme Court has been reexamining the relationship between agency guidelines and the judgments of psychometric experts as expressed in consensual documents like the APA Standards and the Division 14 Principles. In some cases such apparent conflicts have been resolved in a manner consonant Approved For Release 2001/11/071: CIA-RDP00-01458R000100110001-9 Approved For Release 2001/11/07 : CIA-RDP00-01458R000100110001-9 with the latter rather than the former (Lerner, ;1978). Nevertheless, the researcher or practitioner may need to perform additional analyses in order to satisfy these guidelines or case law. Recently, a new legislative approach has teen taken in the area of testing. Generally referred to as "Truth in Testing" legislation, proposals in the U.S. House of Representatives (Gibbons, 1979; Weiss, 1979) would require (among other things) that test publishers and users make avqi1able to examinees copies of their completed test papers or answer sheets with the correct answers marked, com- pletely destroying the security of the tests and creating numerous inimical side effects which would decrease, if not destroy,the validity of the tests. To date, only state legislation has been passed (Califorilia and New York). Bills have been introduced in several other states. Researcher? and practitioners should be alert to these developments. Approved For Release 2001/11/07 : CIA-RDPOO-01458R000100110001-9 Approved For Release 2001/11/07 : CIA-RDP00-01458R000100110001-9 Assessment procedure: any method used to evaluate characteristics of persons. Battery: a combination of two or more scores that predict job performance better than the individual scores alone. Bias: any constant error; any systematic influence on measures or on statistical results irrelevant to the purpose of measurement. Coefficient of correlation: an index number, which may be positive or negative, ranging from 0.00 to 1.00, indicating the extent to which two variables covary. Concurrent validity: a demonstrated relationship between job performance and scores on tests administered to present employees. Concurrent validity model: an approach to validation in which predictor and criterion information are obtained for present employees at approximately the same time. Confidence interval: the bounds on a measurement that define a certain probability that the interval will include the parameter of interest. Confidence limits: the upper and lower limits of the confidence interval. Configural scoring: the assignment of weights to paired variables so that the impli- cation of one predictor score depends upon the level of the second predictor score. Construct: as used here, a trait of individuals inferred from empirical evidence (e.g., numerical ability). Construct validity: a 'demonstrated relationship between underlying traits or "hypothetical constructs" inferred from behavior and a set of test measures related to those constructs. Construct validity is not established with a single study but only with the understanding that comes from a large body of empir- ical evidence. Contamination: any systematic influence on measures or on statistical results irrelevant to the purpose of measurement; any bias or error. Content domain: a body of knowledge and/or a set of tasks or other behaviors defined so that given facts or behaviors may be classified as included or excluded. Content validity: a relationship between job performance and a test that is self- evident because the test includes a representative sample of job tasks. (A typing test is content-valid for a stenographer's job.) What constitutes a rep- resentative sample of tasks is determined through a job analysis. Correlation: the degree to which two or more sets of measurements vary together; e.g., a positive correlation exists when high values on one scale are associated with high values on another. Credibility limits: a term used in Bayesian statistics, roughly equivalent to con- fidence limits. Criterion: some measure of job performance, such as productivity, accident rate, absenteeism, reject rate, training score, and so forth. It also includes sub- jective measures such as supervisory ratings. Criterion-related validity: the statistical statement of the existence of a relationship between scores on a predictor and scores on a criterion measure. Critical score: cutting score; a specified point in a predictor distribution below which candidates are rejected. Cross validation: the application of a scoring system or set of weights empirically derived in one sample to a different sample (drawn from the same population) to investigate the stability of relationships based on the original weights. Approved For Release 2001/11/073: CIA-RDPOO-01458R000100110001-9 Approved For Release 2001/11/07 : CIA-RDP00-01458R000100110001-9 Derived score: a scale of measurement using a system of standard units (based perhaps on standard deviations or centileii), to which obtained scores on any original scale may be transformed by apptopriate numerical manipulation. Expectancy table: a table or chart used for maldtng predictions of levels of criterion performance for specified intervals of predictor scores. Feasible: capable of being done successfully{ i.e., in criterion-related research, economically practical and technically passible without misleading or unin- terpretable results. Job analysis: a method of analyzing jobs in terms of the tasks performed; the performance standards and training content; and the underlying knowledges, skills, and abilities required. Linear combination: the sum of scores (whether weighted differentially or not) on different assessments to form a single composite score; distinguished from nonlinear combinations in which the different scores may, for example, be multiplied instead of added. Moderator variable: theoretically, a variable ?,vhich is related to the amount and type of relationship between two other vbriables. Normative: pertaining to norm groups, i.e., the sample of subjects from which were obtained descriptive statistics (e.g., measure of central tendency, varia- bility, or correlation) or score interpretations (e.g., centiles or expectancies). Objective: verifiable; in measurement, pertaining to scores obtained in a way that minimizes bias or error due to different observers or scorers. Operational independence: gathering of dat-t by methods that are different in procedure or source so that measurement of one variable, such as a criterion, is not influenced by the process of measuring another variable. Predictive validity: a demonstrated relationship between test scores of applicants and some future behavior on the job. Predictive validity model: an approach to validation in which predictor information is obtained at or near the time of hire apd criterion information is obtained at a later date. Predictor: a measurable characteristic used tcj predict criterion performance, e.g., scores on a test, judgments of interviewdrs, etc. Psychometric: pertaining to the measurement of psychological characteristics such as aptitudes, personality traits, achi;vement, skill, knowledge, etc. Raw score: the unadjusted score on a test, psually determined by counting the number of correct answers but sometimes determined by subtracting a fraction of the wrong answers from the number (if correct answers. Regression equation: an algebraic equation which may be used to predict criterion performance from specific predictor scores. Relevance: the extent to which a. criterion memasure accurately reflects the relative standing of employees in important job pl rformance dimensions or behaviors. Reliable: consistent or dependable; repeatable; reliability refers to the consistency of measurement. Replication: a repetition of a research study designed to investigate the generality or stability of the results. Restriction of range: a situation, varying in dqgree, in which the variability of data in a sample is less than the variability in the population from which the sample has been drawn. Score: any specific number in a range of possible values describing the assessment of an individual; a generic term applied for convenience to such diverse kinds of measurement as tests, production cor nts, absence records, course grades, or ratings. Standard deviation: a statistic used to desgrihe the variability within a set of measurements, based on the difference$ between individual scores and the mean. Approved For Release 2001/11/07 : CIA-RDP00-01458R000100110001-9 24 Approved For Release 2001/11/07 : CIA-RDP00-01458R000100110001-9 Standard score: a score which describes the location of a person's score within a set of scores in terms of distance from the mean in standard deviation units; may include scores on certain derived scales. Suppressor variable: a predictor variable essentially unrelated to the criterion, but highly related to a second predictor, which presumably reduces the invalid variance in the latter when both are entered into a multiple R. Synthetic validation: an approach to validation in which the validity of a test battery put together for a specific use may be inferred from prior research relating predictors to specified and relevant criterion elements. Transformed score: any raw score that has undergone a transformation in scale (usually linear) such that the transformed scores have a predetermined mean and standard deviation. Utility: the practical usefulness of a relationship (such as a validity coefficient) that allows the user to make better predictions, save money, improve efficiency, and so forth. Validation: the process of investigation (i.e., research) through which the degree of validity of a predictor can be estimated. (Note: laypersons often misin- terpret the term as if it implied giving a stamp of approval; they should recognize that the result of the research might be zero validity.) Validity: the degree to which inferences from scores on tests or other assessments are justified or supported by evidence. Validity coefficient: a coefficient of correlation showing the strength of relation- ship between predictor and criterion. Validity generalization: the transportability of validity evidence; the application of validity evidence obtained in one or more situations to other situations. Variability: the extent of individual differences in a particular variable. Variable: a quantity that may take on any one of a specified set of values. Variance: a measure of variability; the square of the standard deviation. Approved For Release 2001/11/0725 CIA-RDP00-01458R000100110001-9 Approved For Release 2001/11/07 : CIA-RDP00-01458R000100110001-9 Albemarle Paper Co. v. Moody. 422 U.S. 405 (8975), 9 EPD 10230, 10 FEP 1181. American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. Report of the Joint AERA, APA. NOME Committee for Review ofithe Standards for Educational and Psychological Tests. Unpublished manuscript, 1979. (Available from American Psychological Association. Washington, WC). American Psychological Association, American Educational Research Association, & National Council on Measurement in Education. Standards for educational and psychological tests. Washington, DC: American Psychological Association, 1974. American Psychological Association, Divisidn of Industrial-Organizational Psy- chology. Principles for the validation akd use of personnel selection pro- ceduress. Dayton, OH: Author, 1975. Baehr, M. E. Skills and attributes inventorb. Chicago, IL: The University of Chicago, Industrial Relations Center, 101. Beinis, S. E. Occupational validity of the general aptitude test battery. Journal of Applied Psychology. 1968, 52, 240-244. Brogden, H. E. When testing pays off. Personhel Psychology, 1949, 2, 171.183. Callender. J. C.. & Osburn. H. (1. Developnient and testing of a new model of validity generalization. Journal of Appliett Psychology, in press. Christal. R. E., & Weissmuller, J. J. New Comprehensive Occupational Data Analysis Programs (CODAP) for analvzibag task factor information (AFHRL Interim Professional Paper No. TR-76-3). Lackland Air Force Base, TX: Air Force Human Resources Laboratory, 1976 Cleary. T. A. Test bias: Prediction of grades df Negro and white students in inte- grated colleges. Journal of Educational Measurement, 1968-5, 115-124. Cohen, P. Statistical power analysis for the bhhavioral sciences. New York: Aca- demic Press, 1977. Cole, N. S. Bias in selection. Journal of Educational Measurement, 1973, 10, 237-255. Cook, T. D., & Campbell. D. '1'. The design and conduct of quasi-experiments and true experiments in field settings. In M. D. Dunnette (Ed.), Handbook of industrial and organizational psychology. Chicago: Rand McNally, 1976, 223-326. Cook, "I'. D., & Campbell. D. T. Quasi experimentation: Design and analysis issues for field settings. Chicago: Rand McNallk, 1979. Cornelius, E. F. I11. Carron, T. J., & Collins! A. N. Job analysis models and job classification. Personnel Psychology. 1979, 32 (4), 693-708. Cronbach. L. J. Equity in selection-where psychometrics and political philosophy meet. Journal of Educational Measurement, 1976, 13 (1), 31-41. Cronbach. L. J. Selection theory for a political world. Public Personnel Manage- nient. 1980. 9 (1), 37-50. Cronbach, L. J., & Gleser, G. C. Psychological tests and personnel decisions (2nd edition). Urbana. IL: University of Illinois Press, 1965. Cranbach. L. J., & Meehl, P. E. Construct validity in psychological tests. Psycho- logical Bulletin. 1955, 52. 281-302. Darlington, R. B. Another look at "cultural fairness" Journal of Educational Measurement, 1971, 8, 71-82. Approved For Release 2001/11/07 : qIA-RDP00-01458R000100110001-9 Approved For Release 2001/11/07 : CIA-RDP00-01458R000100110001-9 Dunnette, M. D. Personnel selection and job placement of disadvantaged and minority persons: Problems, issues, and suggestions. In H. L. Fromkin & J. J. Sherwood (Eds.), Integrating the organizations. New York: Free Press, 1974, 55-74. Dunnette, M. D., & Borman, W. C. Personnel selection and classification systems. In M. R. Rosenzweig & L. W. Porter (Eds.), Annual Review of Psychology (Volume 30). Palo Alto, CA: Annual Reviews Inc., 1979, 477-525. Einhorn, H. J., & Bass, A. R. Methodological considerations relevant to discrimi- nation in employment testing. Psychological Bulletin, 1971, 75, 261-269. Ekstrom, R. E. Cognitive factors; sortie recent literature (Technical Report No. 2, ONR Contract N00014-71-C-0117NR150-329). Washington, DC: Office of Naval Research, July 1973. Equal Employment Opportunity Commission, Civil Service Commission, Depart- ment of Labor, & Department of Justice. Adoption by four agencies of uniform guidelines on employee selection procedures (1978). Federal Register, 1978, 43, 38290-38315. Fine, S. A., & Wiley, W. W. An introduction to functional job analysis: Methods for manpower analysis (Monograph No. 4). Kalamazoo, MI: W. E. Upjohn Insti- tute for Employment Research, 1971. Gibbons, S. Truth in testing act of 1979. H. R. 3564, 96th Congress, 1st session, April 10, 1979. Goldstein, I. L. Training in work organizations. In M. R. Rosenzweig & L. W. Porter (Eds.), Annual Review of Psychology (Volume 31). Palo Alto, CA: Annual Reviews, Inc., 1980, 229-272. Griggs v. Duke Power Co. 401 U.S. 424 (1971), 3 EPD P8137, 3 FEP 175. Guion, R. M. Synthetic validity in a small company: A demonstration, Personnel Psychology, 1965, 18, 49-63. Guion, R. M. Employment tests and discriminatory hiring. Industrial Relations, 1966, 5, 20-37. Guion, R. M. Recruiting, selection and job placement. In M. D. Dunnette (Ed.), Handbook of industrial and organizational psychology. Chicago: Rand McNally, 1976, 777-828. Lawshe, C. H. A quantitative approach to content validity. Personnel Psychology, 1975, 28 (4), 563-575. Lerner, B. Washington v. Davis: Quantity, quality, and equality in employment testing. In P. Kurland (Ed.), The Supreme Court Review (1976 vol.). Chicago: University of Chicago Press, 1977. Lerner, B. The Supreme Court and the APA, AERA, NCME test standards: Past references and future possibilities. American Psychologist, 1978, 33, 915-919. Linn, R. L. Fair test use in selection. Review of Educational Research, 1973, 43, .139-161. Linn, R. L. Single-group validity, differential validity, and differential prediction. Journal of Applied Psychology, 1978, 63, 507-512. Linn, R. L. Critical issues in construct validity. Paper presented at Educational Testing Service Construct Validity Colloquium, Princeton, NJ, October, 1979. McCormick, E. J. Job analysis: Methods and applications. New York: AMACOM, 1979. McCormick, E. J., Jeanneret, P. R., & Mecham, R. C. A study of job characteristics and job dimensions as based on the Position Analysis Questionnaire (PAQ). Journal of Applied Psychology, 1972, 56, 347-368. McCormick, E. J., & Mecham, R. G. Job analysis data as a basis for synthetic test validity. Psychology Annual, 1970, 4, 30-35. Messick, S. The standard problem: Meaning and values in measurement and evaluation. American Psychologist, 1975, 30 (10), 955-966. Approved For Release 2001/11/071 CIA-RDPOO-01458R000100110001-9 Approved For Release 2001/11/07 : CIA-RDP00-01458R000100110001-9 Novick, M. R., & Jackson, P. H. Statistical methods for educational and psycho- logical research. New York: McGraw-Hill) 1974. Pass, J. J., & Cunningham. J. W. Occupational clusters bayed on systematically derived work dimensions (Final Report). Raleigh, NC: North Carolina State University, Center for Occupational Education, 1977. Pearlman, K. Job families: A review and discussion of their implications for personnel selection. Pcvchological Bulleti*, 1980, 87 (1), 1-28. Pearlman, K., Schmidt, F. L., & Hunter, J. El Validity generalization results for tests used to predict rob proficiency and training success in clerical occupa- tions. Journal of Applied Psychology, in press. Petersen. N. S., & Novick, M. R. An evaluatihn of some models for culture-fair selection. Journal of Educational Measurdment, 1976, 1.3, 3-39. Peterson, D. A., & Wallace, S. R. Validation ahd revision of a test in use. Journal of Applied Psychology. 1966, 50, 13-17. I'rien, E. P.. & Ronan. W. W. Job analysis: A review of research findings. Personnel Psychology, 1971, 24 t3), 371-396. Primoff, E. S. Summary of job-element principles: Preparing a job-element standard. Washington, DC: U.S. Civil Service Commission, Personnel Mea- surement and Development Center, 1971.1 Primoff, E. S. The J-coefficient procedure. Washington, DC: U.S. Civil Service Commission, Personnel Measurement and Development Center, 1972. Schmidt, F. L., Hunter, J. E., McKenzie, R., a Muldrow, T. The impact of valid selection procedures on workforce productivity. Journal of Applied Psy- cologv, 1979, 64 (6), 609-626. Schmidt. E L., Hunter. J. E., & Pearlman, K.1Task differences as moderators of aptitude test validity in selection: A red herlring. Journal of Applied Psychology. in press. Schmidt. F L., Hunter. J. E., Pearlman. K., &1 Shane, G. S. Further tests of the Schmidt-Hunter Bayesian validity generalization procedure. Personnel Psv- chology, 1979, 32 (2). 257-281. Schmidt. F. L., Hunter, J. E., & Urry, V. W. Statistical power in criterion-related validity studies. Journal of Applied Psychology, 1976, 61, 473-485. Schwartz, L). J. A probabilistic approach to 4dverse effect, job relatedness and criterion differences. Public Personnel Management, 1978, 7 (6), 368-377. Siparks, C. P. Guidance and guidelines. The Industrial-Organizational P.rvchologist. 197-7, 14 (3), 30-33. Stanley, J. C. Reliability. In R. L. Thorndike (ELI.), Educational measurement (2nd 1(d.). Washington, DC: American Councillor Education, 1971. The. Psychological Corporation. Summaries 4f court decisions on employment testing: 1968-1977. New York: Author, 197l . Thorndike, R. L. Concepts of culture-fairnessL Journal Of Educational Msasure- rnent, 1971, 8, 63-70. U.S. Office of Personnel Management. EEO 1Court Cases (September 1979 Re- vision). Washington, DC: Author, 1979. Wallace, S. R. Criteria for what? American Psychologist, 1965, 20, 411-417. Washington v. Davis. 426 U.S. 229 (1976), 12 IFEP 1415. Weiss, J. Educational resting act of 1979. H. R. 4949, 96th Congress, 1st session. July 24, 1979. Approved For Release 2001/11/07 :,cIA-RDP00-01458R000100110001-9