Lurking Variable Basics: How Confounding Variables Skew Data
Written by MasterClass
Last updated: Oct 5, 2022 • 3 min read
When building a statistical model, extraneous variables can skew data or serve as a causal link that may fly under your radar. These lurking variables may be difficult to find at times, but it’s essential to know how to identify them if you want to ensure your research is sound. Learn more about what lurking variables are and how to identify them.
Learn From the Best
What Is a Lurking Variable?
A lurking variable is an extraneous piece of data that can have an outsized impact on a statistical model.
In most observational studies, statisticians will choose an independent variable (an ostensible cause) and a dependent variable (the effect(s) the independent variable allegedly causes). Lurking variables are additional data points that can serve as alternate causes affecting your dependent variable. In randomized trials, the random assignment of variables makes the appearance of lurking variables like these less likely.
As an example of how lurking variables can skew data, consider Simpson’s paradox. The mathematician Edward H. Simpson first noticed that some statistical data sets end up telling a different story when you break the wider population into subpopulations. One of the reasons for these discrepancies is the possibility of lurking variables.
Lurking Variable vs. Confounding Variable
Lurking and confounding variables refer to the same concept by two different names. In statistical analysis, mathematicians call the process of an extraneous variable influencing an observational study “confounding” or “common response.” By extension, the term “lurking” comes from the implication that these variables lurk outside the statistician’s knowledge of the dataset until they pinpoint them.
Why Are Lurking Variables Important?
Lurking variables can have an extraordinary impact on statistical models. Here are just some reasons it’s essential to know about their existence:
- Lurking variables help you solve problems. Suppose you have a dataset where your independent and dependent variables don’t seem to correlate. In a circumstance like this, identifying a lurking variable can go a long way to show what might be going awry in your experimental design. They can act as explanatory variables for previously inscrutable problems.
- Lurking variables inspire new research. Through regression analysis and other methodologies, statisticians can both identify and utilize lurking variables to formulate new hypotheses. Determining causation can prove difficult. The more possible variables you’re aware of, the better off you are to find what might be at the root of a problem.
- Lurking variables skew data. Unless you can identify lurking variables, you run the risk of an unknown third variable skewing your statistical analysis. For example, suppose you notice an increase in drownings at the same time ice cream sales increase. While the ice cream might have an effect, the lurking variable of hotter weather (inspiring people to buy ice cream and also, unrelatedly, to go swimming) might be the more causative factor.
How to Identify a Lurking Variable
To identify a lurking variable, you need to know what you’re looking for in a statistical model. Consider these tips as you seek to find them in your own observational studies:
- Assess your data. Go through all your dependent and independent variables, coefficients, and other inputs. Ask yourself if there are any possible elements you might be missing in your research. Consider utilizing a linear or nonlinear regression model (depending on which is suitable) after you complete your modeling, as this can help you eliminate any residual issues with your study.
- Define the lurking variable. Look at other studies and examples of lurking variables in them. Go back over your own research. Ask if there’s more of a positive correlation between your dependent variable with an extraneous data point than the independent variable you chose at first. Try to do this before you even run your initial study if possible, as that will go the furthest in eliminating the potential amount of damage.
- Rerun your study. If you think you’ve found an influential lurking variable, consider running a parallel study with this variable to home in on its potential causative importance. The goal of statistical research is to find the most likely cause of why data appears the way it does in any format. Rerun your study with the lurking variable to see if it does, in fact, have a causative and strong relationship to your other variables.
Learn More
Get the MasterClass Annual Membership for exclusive access to video lessons taught by science luminaries, including Terence Tao, Bill Nye, Neil deGrasse Tyson, Chris Hadfield, Jane Goodall, and more.