Chapter 3: Understanding Test Quality-Concepts of Reliability and ValidityTest reliability and validity are two technical properties of a test that indicate the quality and usefulness of the test. These are the two most important features of a test. You should examine these features when evaluating the suitability of the test for your use. This chapter provides a simplified explanation of these two complex ideas. These explanations will help you to understand reliability and validity information reported in test manuals and reviews and use that information to evaluate the suitability of a test for your use. Show
Chapter Highlights
Principles of Assessment Discussed Use only reliable assessment instruments and procedures. Use only assessment procedures and instruments that have been demonstrated to be valid for the specific purpose for which they are being used. Use assessment tools that are appropriate for the target population. What makes a good test?An employment test is considered "good" if the following can be said about it:
The degree to which a test has these qualities is indicated by two technical properties: reliability and validity. Test reliabilityReliability refers to how dependably or consistently a test measures a characteristic. If a person takes the test again, will he or she get a similar test score, or a much different score? A test that yields similar scores for a person who repeats the test is said to measure a characteristic reliably. How do we account for an individual who does not get exactly the same test score every time he or she takes the test? Some possible reasons are the following:
Principle of Assessment: Use only reliable assessment instruments and procedures. In other words, use only assessment tools that provide dependable and consistent information. These factors are sources of chance or random measurement error in the assessment process. If there were no random errors of measurement, the individual would get the same test score, the individual's "true" score, each time. The degree to which test scores are unaffected by measurement errors is an indication of the reliability of the test. Reliable assessment tools produce dependable, repeatable, and consistent information about people. In order to meaningfully interpret test scores and make useful employment or career-related decisions, you need reliable tools. This brings us to the next principle of assessment. Interpretation of reliability information from test manuals and reviewsTest manuals and independent review of tests provide information on test reliability. The following discussion will help you interpret the reliability information about any test. The reliability of a test is indicated by the reliability coefficient. It is denoted by the letter "r," and is expressed as a number ranging between 0 and 1.00, with r = 0 indicating no reliability, and r = 1.00 indicating perfect reliability. Do not expect to find a test with perfect reliability. Generally, you will see the reliability of a test as a decimal, for example, r = .80 or r = .93. The larger the reliability coefficient, the more repeatable or reliable the test scores. Table 1 serves as a general guideline for interpreting test reliability. However, do not select or reject a test solely based on the size of its reliability coefficient. To evaluate a test's reliability, you should consider the type of test, the type of reliability estimate reported, and the context in which the test will be used. Table 1. General Guidelines for
Types of reliability estimatesThere are several types of reliability estimates, each influenced by different sources of measurement error. Test developers have the responsibility of reporting the reliability estimates that are relevant for a particular test. Before deciding to use a test, read the test manual and any independent reviews to determine if its reliability is acceptable. The acceptable level of reliability will differ depending on the type of test and the reliability estimate used. The discussion in Table 2 should help you develop some familiarity with the different kinds of reliability estimates reported in test manuals and reviews.
|
Validity coefficient value | Interpretation |
---|---|
above .35 | very beneficial |
.21 - .35 | likely to be useful |
.11 - .20 | depends on circumstances |
below .11 | unlikely to be useful |
As a general rule, the higher the validity coefficient the more beneficial it is to use the test. Validity coefficients of r =.21 to r =.35 are typical for a single test. Validities for selection systems that use multiple tests will probably be higher because you are using different tools to measure/predict different aspects of performance, where a single test is more likely to measure or predict fewer aspects of total performance. Table 3 serves as a general guideline for interpreting test validity for a single test. Evaluating test validity is a sophisticated task, and you might require the services of a testing expert. In addition to the magnitude of the validity coefficient, you should also consider at a minimum the following factors:
- level of adverse impact associated with your assessment tool
- selection ratio (number of applicants versus the number of openings)
- cost of a hiring error
- cost of the
selection tool
- probability of hiring qualified applicant based on chance alone.
Here are three scenarios illustrating why you should consider these factors, individually and in combination with one another, when evaluating validity coefficients:
Scenario One
You are in the process of hiring applicants where you have a high selection ratio and are filling positions that do not require a great deal of skill. In this situation, you might be willing
to accept a selection tool that has validity considered "likely to be useful" or even "depends on circumstances" because you need to fill the positions, you do not have many applicants to choose from, and the level of skill required is not that high.
Now, let's change the situation.
Scenario Two
You are recruiting for jobs that require a high level of accuracy, and a mistake made by a worker could be dangerous and costly. With these additional factors, a slightly lower
validity coefficient would probably not be acceptable to you because hiring an unqualified worker would be too much of a risk. In this case you would probably want to use a selection tool that reported validities considered to be "very beneficial" because a hiring error would be too costly to your company.
Here is another scenario that shows why you need to consider multiple factors when evaluating the validity of assessment tools.
Scenario Three
A company you are working
for is considering using a very costly selection system that results in fairly high levels of adverse impact. You decide to implement the selection tool because the assessment tools you found with lower adverse impact had substantially lower validity, were just as costly, and making mistakes in hiring decisions would be too much of a risk for your company. Your company decided to implement the assessment given the difficulty in hiring for the particular positions, the "very beneficial" validity
of the assessment and your failed attempts to find alternative instruments with less adverse impact. However, your company will continue efforts to find ways of reducing the adverse impact of the system.
Again, these examples demonstrate the complexity of evaluating the validity of assessments. Multiple factors need to be considered in most situations. You might want to seek the assistance of a testing expert (for example, an industrial/organizational psychologist) to evaluate the appropriateness of particular assessments for your employment situation.
When properly applied, the use of valid and reliable assessment instruments will help you make better decisions. Additionally, by using a variety of assessment tools as part of an assessment program, you can more fully assess the skills and capabilities of people, while reducing the effects of errors associated with any one tool on your decision making.
A document by
the:
U.S. Department of Labor
Employment and Training Administration
1999