Reliability vs. Validity in Scientific Research
Written by MasterClass
Last updated: Mar 28, 2022 • 5 min read
In the fields of science and technology, the terms reliability and validity are used to describe the robustness of qualitative and quantitative research methods. While these criteria are related, the terms aren’t interchangeable.
Learn From the Best
What Is Reliability in Research?
Reliability is a way of describing the consistency of a measure, test, study, or experiment. If someone can collect data of their own, repeat the procedure, and get consistent results each time, the test is considered reliable. For example, if an eye test, conducted by different doctors, over a period of weeks on the same patients, produces the same results, the test is likely scientifically reliable, since the results are consistent.
What Is Validity in Research?
Validity describes the accuracy of a measure—if the study measures what it intends to. For example, if a test asking participants to memorize the names of fruit has results differentiated by education levels, it may be measuring education or vocabulary rather than memory, and even if it is repeatable, will not be valid. A valid method will also correspond with other known and verified variables and theories.
4 Types of Reliability
Reliability, or the consistency of a study, is vital in both quantitative and qualitative research. There are many types of reliability, and a robust test should meet all appropriate types.
- 1. Test-retest reliability: This type of reliability is marked by the ability to reproduce results over time. If a test or measure is conducted repeatedly, with only the time of the test changing, and the answer is the same or very similar, the resulting correlation coefficient would indicate that you have an instance of test-retest reliability.
- 2. Alternate form reliability: This form of reliability is when a test is given in an altered form to subsequent participants. If the same test administered in different forms yeilds consistent results, it has alternate form reliability. This can help bolster the reliability of the test and is evidence of sound research design.
- 3. Inter-rater reliability: This reliability criterion pertains to the people who are conducting the test or taking the measure. If you get very close test scores across a range of different test administrators, then you have good inter-rater reliability, which is often expressed in a measure known as Cohen’s kappa.
- 4. Internal consistency reliability: With internal consistency, you are seeking high reliability in different portions of the same test, or in different tests meant to measure the same thing. For example, in a survey or questionnaire meant to identify the level of customer satisfaction, internal consistency reliability would mean satisfied customers consistently answer “Yes” to multiple positive questions (such as: Did you enjoy your experience, would you recommend our company to friends, etc.) while dissatisfied customers would consistently reply “No.” Internal consistency is often measured by Cronbach’s alpha, a number between zero and 1, with .7 or higher indicating high reliability.
8 Types of Validity
Just as with reliability, there are different categories of validity. Depending on what you are assessing, you can measure the validity of your tests with one or more of the types listed below.
- 1. Construct validity: A measure or test shows construct validity when it is consistent with accepted and established theories or measures of the trait it is attempting to measure. A study of emotional maturity would have good construct validity if it accurately measured known traits that indicate emotional maturity, not factors that resemble emotional maturity but don’t indicate it.
- 2. Criterion validity: A test has criterion-related validity if its results either align with the results of other valid tests (concurrent validity) or predict future test results (predictive validity). For example, if a new test strongly correlates the current job performance of employees with their level of job satisfaction—and it does so just as well as an accepted version of the test—it may be said to have criterion validity, specifically concurrent validity. If it accurately predicts employees' future performance, it will show criterion validity, specifically predictive validity.
- 3. Content validity: Content validity refers to the ability of your measures to cover all relevant aspects of the thing being measured or tested. If a general math aptitude test at a particular grade level included questions on addition, subtraction, multiplication, but not division, it would lack sufficient content validity since a significant function in mathematics was missing from the test.
- 4. Face validity: This type of validity tries to ascertain whether a measure or test seems reasonable on appearance. It is somewhat less formal than other measures since it relies on the impressions of those who are looking at the measure and the results it is giving, but is useful as a quick way to size up a test.
- 5. Internal validity: Internal validity refers to how well a test establishes a clear causal relationship between factors. If you are trying to measure the effect of income level on the levels of anxiety of subjects, the measurement has to establish that it’s low income that’s the dominant cause of anxiety, and not other factors, such as work conditions or environmental factors.
- 6. External validity: External validity looks at how well test results can be generalized to other contexts. If a test is too narrow and works only with a small data set, it won’t be worth much in the real world and therefore doesn’t have much use as a valid measure.
- 7. Convergent validity: A subcategory of construct validity, convergent validity measures the degree to which theoretically related constructs are actually related in practice.
- 8. Discriminant validity: Also a subcategory of construct validity, discriminant validity measures the degree to which theoretically unrelated constructs are actually unrelated in practice.
Reliability vs. Validity: How Are They Connected?
While reliability and validity are related terms, they measure different things. A test might be highly reliable but not valid. For example, if you ask employees to fill in a survey related to job satisfaction and get the same results multiple times, the survey passes the reliability test. But if in fact, the questions are more about general happiness level or job performance rather than satisfaction, the survey is not valid.
On the other hand, validity almost always includes reliability. If you are testing something and getting results that accurately measure things in the world, they will also be repeatable. Problems with reproducing the test will be related to random error in the data collection or by those implementing the test, not the test itself.
Learn More
Get the MasterClass Annual Membership for exclusive access to video lessons taught by science luminaries, including Bill Nye, Terence Tao, Neil deGrasse Tyson, Chris Hadfield, Jane Goodall, and more.