Breusch-Pagan test: A Comprehensive Guide to Detect Heteroskedasticity in Regression

Heteroskedasticity is a familiar challenge for researchers and data analysts alike. It occurs when the variance of the error terms in a regression model is not constant across observations. This often produces inefficient estimates and biased standard errors, which in turn undermines confidence in hypothesis tests and confidence intervals. The Breusch-Pagan test is one of the most widely used diagnostic tools to check for heteroskedasticity in the residuals of an ordinary least squares (OLS) regression. In this extensive guide, we unpack the theory, practical implementation, interpretation, and common pitfalls of the Breusch-Pagan test, with clear examples and tips for researchers working in economics, finance, social sciences, and beyond.
Introduction to the Breusch-Pagan test
The Breusch-Pagan test, named after Trevor Breusch and Adrian Pagan who introduced the approach in the 1970s, is a Lagrange multiplier (LM) type test designed to detect whether the variance of the regression errors can be explained by the values of the independent variables. In plain terms, it asks: do the squared residuals from the primary regression show a systematic pattern when regressed on the regressors (or a transformation of them) used in the model? If yes, there is evidence of heteroskedasticity, and standard errors derived from OLS may be unreliable.
Origins and naming
The Breusch-Pagan test emerged from the broader family of LM tests, which focus on the idea that if a constrained model is truly correct, adding certain auxiliary terms should not improve the fit. In the context of heteroskedasticity, the auxiliary regression regresses the squared residuals on the explanatory variables. A significant R-squared value from this auxiliary regression suggests heteroskedasticity. The test is widely taught and implemented across econometrics textbooks and software packages, making it a staple in applied research.
Why heteroskedasticity matters in regression analysis
Ordinary least squares regression rests on several key assumptions, including homoskedasticity—the condition where the variance of the error term is constant across observations. When heteroskedasticity is present, several issues arise:
- Standard errors are biased, leading to unreliable t-statistics and p-values.
- Confidence intervals may be too narrow or too wide, depending on the form of heteroskedasticity.
- Efficiency of the OLS estimates remains intact for the coefficients themselves (the Gauss-Markov theorem still holds for unbiasedness and consistency), but you cannot trust the usual inference without corrections.
Detecting heteroskedasticity is therefore an essential step before carrying out inference. The Breusch-Pagan test provides a practical route to test whether the variance of the residuals is related to the explanatory variables in your model. It complements other tools such as robust standard errors, White’s test, and visual diagnostics, giving researchers a more complete picture of the data-generating process.
Mathematical intuition behind the Breusch-Pagan test
At its core, the Breusch-Pagan test exploits a simple idea: if the error variance is related to the regressors, then the squared residuals from the primary regression should be explainable by those regressors. The steps unfold as follows:
- Estimate the primary regression model using OLS and obtain the residuals ûi.
- Form the squared residuals, ûi^2, which serve as proxies for the local variances of the error term.
- Run an auxiliary regression where ûi^2 is regressed on the original regressors (or a subset or transformation of them, such as the constant term and the explanatory variables).
- Compute R-squared (R^2) from the auxiliary regression. Multiply this R^2 by the sample size n to obtain the Breusch-Pagan statistic: BP = n × R^2.
- Compare the BP statistic to a chi-squared distribution with degrees of freedom equal to the number of regressors in the auxiliary regression (excluding the intercept). If BP exceeds the critical value, you reject the null hypothesis of homoskedasticity.
Under the null hypothesis of homoskedasticity, the BP statistic asymptotically follows a chi-squared distribution with k degrees of freedom, where k is the number of regressors used in the auxiliary regression. It is common to include the constant term in the auxiliary regression, but the exact degrees of freedom depend on the specification you choose for the auxiliary model.
How to perform the Breusch-Pagan test in practice
Below is a step-by-step guide you can apply in common statistical software packages. The emphasis is on clarity and replicability, so you can adapt the steps to your data and preferred tools.
Step 1: Run the primary regression and obtain residuals
Estimate your OLS regression model of the form:
y = Xβ + ε
where y is the dependent variable, X is the matrix of regressors, β are the coefficients, and ε is the error term. Save the residuals ûi from this regression. It is crucial that you use the same specifications you intend to report in your analysis, including the correct treatment of any transformations or interactions.
Step 2: Create the auxiliary regression
Compute the squared residuals ûi^2. Regress these squared residuals on the explanatory variables N (the regressors used in the primary model) or a transformation thereof. A common specification is to include the same regressors as in the primary model, possibly plus higher-order terms or interactions if your theory suggests nonlinearity in the variance structure.
Step 3: Compute the Breusch-Pagan statistic
From the auxiliary regression, obtain R^2. Multiply by the sample size n to obtain the Breusch-Pagan statistic:
BP = n × R^2_auxiliary
In practice, many software packages will compute this directly as part of a dedicated heteroskedasticity test option.
Step 4: Decide using the chi-squared distribution
Choose the degrees of freedom equal to the number of regressors in the auxiliary regression (excluding the intercept). Compare BP to the critical value from the chi-squared distribution, or use the p-value reported by the software. A small p-value (commonly below 0.05) indicates evidence against the null hypothesis of homoskedasticity, suggesting heteroskedasticity is present.
Step 5: Interpret cautiously and consider alternatives
Statistical significance is one thing; practical significance and model specification are another. If the Breusch-Pagan test indicates heteroskedasticity, consider alternative specifications, transformations of the dependent variable (for example, a logarithm or Box-Cox transformation), robust standard errors (e.g., White or HC1), or even a different modelling framework that accommodates heteroskedasticity, such as weighted least squares when the form of heteroskedasticity is known.
Assumptions and limitations of the Breusch-Pagan test
The Breusch-Pagan test rests on several important assumptions and has certain limitations to keep in mind:
- Linearity of the variance structure: The test assumes that the variance of the error term is a linear function of the regressors (or their transformations). If heteroskedasticity arises in a highly nonlinear fashion, the Breusch-Pagan test may have reduced power.
- Independent observations: As with most regression-based tests, the Breusch-Pagan test presumes observations are independent. Serial correlation or clustering can distort the test’s size and power.
- Normality of errors is not strictly required for the test to be valid, but heavy-tailed or highly skewed error distributions can affect finite-sample performance.
- Specification dependence: The choice of regressors in the auxiliary regression influences the test. If you omit relevant variables that govern the variance, you may fail to detect heteroskedasticity even when it exists.
- In the presence of model misspecification, the test results can be misleading. It is prudent to combine the Breusch-Pagan test with other diagnostic checks to form a coherent assessment of the model.
Breusch-Pagan test vs other tests for heteroskedasticity
There are several alternative tests for heteroskedasticity, and each has strengths and weaknesses depending on the context. The most common companions to the Breusch-Pagan test are:
White test
The White test is a more flexible, nonparametric general test that does not assume a specific functional form for the variance. It involves regressing the squared residuals on the original regressors, their squares, and cross-products. While the White test can be more powerful against a wider class of heteroskedasticity patterns, it also requires a larger sample and more degrees of freedom, which can reduce power in small samples.
Goldfeld-Quandt test
The Goldfeld-Quandt test is particularly useful when heteroskedasticity is suspected to arise from a specific ordering of observations, often in time-series data where variance shifts occur between segments. It is not as flexible as the Breusch-Pagan test when the form of heteroskedasticity is more diffuse or tied to multiple regressors.
Harvey-Collier and other LM tests
Other LM-type tests, including variations designed for different data structures or alternative hypotheses, can be appropriate in more specialised settings. The choice depends on theoretical expectations about how heteroskedasticity may manifest in your data.
Interpreting results from the Breusch-Pagan test
Interpreting the results involves weighing statistical significance against practical considerations about model specification. Key points to remember include:
- A statistically significant Breusch-Pagan test indicates evidence of heteroskedasticity. This does not necessarily identify the exact form of heteroskedasticity, nor does it imply that all regression coefficients are biased—though standard errors are likely biased, which affects inference.
- When heteroskedasticity is detected, robust standard errors (such as White’s or HC1 in many software environments) are often recommended to obtain valid standard errors and test statistics for the coefficients.
- Exploring transformations of the dependent variable (for example, logarithmic or Box-Cox transformations) or adding relevant predictors that potentially explain the variance can mitigate heteroskedasticity.
- In some cases, heteroskedasticity is a natural feature of the data, particularly in cross-sectional studies with diverse units, such as households, firms, or geographical regions. Acknowledging and reporting this feature is important for transparent research.
Practical case study: A simple example in economics
Imagine you are analysing a cross-sectional dataset of households, with annual expenditure as the dependent variable and income, age of the household head, and a binary variable capturing homeownership as regressors. You fit an OLS model and obtain residuals. The Breusch-Pagan test can help determine whether the variance of the residuals is related to income, age, and homeownership.
Steps you would take:
- Estimate the OLS model: Expenditure on Income, Age, and Homeownership.
- Save the residuals ûi and compute ûi^2.
- Regress ûi^2 on the same regressors (Income, Age, Homeownership) or a chosen transformation (e.g., log income, interaction terms, etc.).
- Compute R^2 from this auxiliary regression and multiply by the sample size to obtain BP.
- Compare BP to the chi-squared distribution with degrees of freedom equal to the number of regressors in the auxiliary regression. If the p-value is small, heteroskedasticity may be present, suggesting the need for robust standard errors or model re-specification.
Suppose the BP statistic is significant. In this case, you would report that the conventional standard errors are not reliable, and you would present results using robust standard errors. If, after re-specification, heteroskedasticity remains, you might consider using a heteroskedasticity-robust estimation framework (e.g., HC1 or White’s robust standard errors) or switch to a modelling approach that directly accounts for variance structure.
Implementation across popular software environments
Many statistical packages have built-in options to perform the Breusch-Pagan test. Here is a compact overview of how to access the test in a few common tools. The exact syntax may vary by version, but the core idea remains consistent.
R
In R, the Breusch-Pagan test is available through several packages, with base R offering a straightforward approach via the bptest function from the lmtest package, coupled with a formula that includes the regressors. Example:
library(lmtest) library(lmtest) model <- lm(y ~ x1 + x2 + x3, data = mydata) bptest(model)
For a more manual approach, you can extract residuals, square them, and run a regression of û^2 on X, then compute BP = n × R^2. Several tutorials show this explicit calculation for educational purposes.
Python (statsmodels)
In Python, the Breusch-Pagan test is accessible via the statsmodels.stats.diagnostic module. Example:
import statsmodels.api as sm from statsmodels.stats.diagnostic import het_breuschpagan # Fit the OLS model X = sm.add_constant(df[['x1','x2','x3']]) y = df['y'] model = sm.OLS(y, X).fit() # BP test bp_test = het_breuschpagan(model.resid, model.model.exog) labels = ['Lagrange multiplier statistic', 'p-value', 'f-value', 'f p-value'] print(dict(zip(labels, bp_test)))
Stata
Stata offers a simple command to conduct the Breusch-Pagan test after running regress. Example:
regress y x1 x2 x3 estat hettest
The output includes the BP statistic and the p-value, along with the degrees of freedom.
EViews
EViews provides a diagnostic option within the regression output to test for heteroskedasticity, including the Breusch-Pagan approach, often found under residual diagnostics or heteroskedasticity tests.
Extensions and variations of the Breusch-Pagan test
Researchers often extend or modify the core idea of the Breusch-Pagan test to better accommodate specific data characteristics or theoretical concerns. Some notable variants include:
- Augmented Breusch-Pagan tests that incorporate additional transformations of the regressors or include lagged residuals to capture time-series dynamics.
- Heteroskedasticity-robust versions of the auxiliary regression, which can be beneficial when the data exhibit non-constant variance patterns not well captured by simple linear forms.
- Bayesian adaptations that treat the variance structure as a random process and integrate prior information about potential variance patterns.
When choosing among these options, consider the nature of your data (cross-sectional vs. time-series), the presence of potential clustering, and the size of your sample. In small samples, some tests have reduced power, so a complementary approach using multiple diagnostics is prudent.
Common pitfalls and best practices
To maximise the reliability of the Breusch-Pagan test results, keep these practical tips in mind:
- Ensure the primary model is specified with care. Misspecification can lead to spurious indications of heteroskedasticity or obscure genuine patterns.
- Check for influential observations and outliers, which can distort the auxiliary regression and unduly affect the test statistic.
- Be mindful of the chosen auxiliary regression: including inappropriate or redundant regressors may inflate or deflate the test statistic artificially.
- Consider the data structure. In the presence of clustering, multi-level models, or serial correlation, standard BP tests may not have the correct size. Robust or cluster-robust variants may be more appropriate.
- Use robust standard errors as a practical default when risk management, policy decisions, or business decisions depend on the inference drawn from the regression.
Interpreting practical outputs and reporting the Breusch-Pagan test
In a research report or paper, you should present a concise interpretation of the Breusch-Pagan test results:
- State the null hypothesis: homoskedasticity (constant error variance) is assumed under the Breusch-Pagan test.
- Report the BP statistic, the degrees of freedom, and the p-value. For example: “Breusch-Pagan test statistic = 8.65, df = 3, p = 0.034.”
- Provide a brief discussion about the implications for inference in your model. If heteroskedasticity is detected, explain the steps you took to mitigate it (robust standard errors, transformation, model re-specification, etc.).
- Comment on the potential limitations and any sensitivity analyses you performed, such as running the test with alternative auxiliary regression specifications or additional covariates.
Putting it all together: best practice workflow
For a robust applied analysis, integrate the Breusch-Pagan test into a broader diagnostic routine. A practical workflow may look like this:
- Initial OLS estimation with a theoretically informed set of regressors.
- Diagnostic plots of residuals versus fitted values and versus each regressor to visually inspect variance patterns.
- Breusch-Pagan test to check for variance that is linked to the regressors. If the test is non-significant, you may proceed with standard inference, while staying attentive to other diagnostics.
- If significant, implement robust standard errors and re-examine coefficient estimates and their significance. Consider alternative model specifications or data transformations that could stabilise the variance.
- Optionally, run White’s test or other heteroskedasticity diagnostics to verify the stability of conclusions across different tests.
- Document the final modelling choice, including how heteroskedasticity was addressed and why the chosen approach is appropriate for the data at hand.
Conclusion: the value of the Breusch-Pagan test in modern analysis
The Breusch-Pagan test remains a practical, conceptually straightforward tool for detecting heteroskedasticity in regression models. Its appeal lies in its simplicity, compatibility with standard OLS workflows, and adaptability across software environments. While it is not a panacea for all forms of variance instability, the Breusch-Pagan test provides a clear diagnostic signal that guides researchers on whether to rely on conventional inference or adopt more robust strategies. When combined with complementary diagnostics and thoughtful model specification, the Breusch-Pagan test helps ensure that conclusions drawn from empirical work are both credible and transparent.
As data landscapes evolve—with bigger datasets, richer variable structures, and diverse data-generating processes—the role of heteroskedasticity diagnostics like the Breusch-Pagan test remains central. It empowers analysts to acknowledge uncertainty, strengthen inference, and communicate findings with greater integrity. Whether you are working in a university econometrics lab, a policy research unit, or a corporate analytics team, understanding and applying the Breusch-Pagan test is a valuable skill in the modern statistician’s toolkit.