11.6 Chapter Summary
"The beauty of regression discontinuity is that it provides a transparent way to learn about causal effects."— Joshua Angrist & Jörn-Steffen Pischke, Mostly Harmless Econometrics
Master RDD to become a causal inference expert
Core Concepts Review
The Essence of RDD
Regression Discontinuity Design is a quasi-experimental method that exploits the discontinuity in treatment assignment rules at a threshold to identify causal effects.
Core idea:
"In the neighborhood of the cutoff, RDD is as good as a randomized experiment. The discontinuity is your friend." — Joshua Angrist
Key elements:
- Running Variable : Continuous variable determining treatment assignment
- Cutoff : Threshold where treatment assignment changes
- Treatment : Changes discontinuously at cutoff
- Outcome : Variable we care about
Sharp RDD vs Fuzzy RDD
Sharp RDD
Definition: Treatment status completely determined by cutoff
Identification:
Causal interpretation: Local average treatment effect at cutoff (LATE)
Classic cases:
- Lee (2008): Electoral vote share 50%
- Carpenter & Dobkin (2009): 21-year drinking age
Fuzzy RDD
Definition: Treatment probability jumps at cutoff, but not from 0 to 1
Identification (Wald estimator):
Essence: Instrumental variable (IV) estimation, cutoff as instrument
Causal interpretation: LATE for compliers
Classic case:
- Angrist & Lavy (1999): Class size (Maimonides' Rule)
Core Assumptions
1. Continuity Assumption ⭐
Assumption: Potential outcome functions continuous at cutoff
In plain language: Without treatment, outcome doesn't jump at cutoff
Not directly testable, but can be verified indirectly through:
- Covariate balance tests
- Density test (McCrary Test)
- Placebo tests
2. No Precise Manipulation
Assumption: Individuals cannot precisely manipulate running variable to just cross cutoff
Test: McCrary density test
- : Density continuous at cutoff
- Reject → Possible manipulation
3. Local Exclusion
Assumption: Running variable affects outcome only through treatment (near cutoff)
Practice: Choose "exogenous" running variables (age, lottery numbers, exam scores, etc.)
️ Empirical Implementation Steps
1. Data Preparation
import numpy as np
import pandas as pd
from rdrobust import rdrobust, rdplot, rdbwselect
# Center running variable
df['X_centered'] = df['X'] - cutoff
df['D'] = (df['X'] >= cutoff).astype(int)2. Visualization (RDD Plot)
# Using rdplot
rdplot(y=df['Y'], x=df['X'], c=cutoff,
title='RDD Plot',
x_label='Running Variable',
y_label='Outcome')3. Main Effect Estimation
# Using rdrobust (CCT optimal bandwidth + robust inference)
result = rdrobust(y=df['Y'], x=df['X'], c=cutoff)
print(result)4. Validity Tests
# Covariate balance
for cov in covariates:
balance_test = rdrobust(y=df[cov], x=df['X'], c=cutoff)
print(f"{cov}: p={balance_test.pval[0]:.4f}")
# Density test
from rddensity import rddensity
density_test = rddensity(X=df['X'], c=cutoff)
print(f"McCrary Test p-value: {density_test.pval[0]:.4f}")
# Placebo tests (false cutoffs)
placebo_cutoffs = [cutoff - 10, cutoff + 10]
for pc in placebo_cutoffs:
placebo_result = rdrobust(y=df['Y'], x=df['X'], c=pc)
print(f"Placebo {pc}: p={placebo_result.pval[0]:.4f}")5. Robustness Tests
# Bandwidth sensitivity
bandwidths = [0.5*h_optimal, 0.75*h_optimal, h_optimal, 1.25*h_optimal]
for h in bandwidths:
result_h = rdrobust(y=df['Y'], x=df['X'], c=cutoff, h=h)
print(f"h={h:.2f}: Effect={result_h.coef[0]:.4f}")
# Donut-hole
df_donut = df[np.abs(df['X_centered']) >= 1] # Exclude |X-c| < 1
result_donut = rdrobust(y=df_donut['Y'], x=df_donut['X'], c=cutoff)️ Common Pitfalls and Cautions
Pitfall 1: High-Order Polynomials ()
Problem: Gelman & Imbens (2019) warn against using high-order polynomials
- Prone to overfitting
- Unreliable confidence intervals
Recommendation:
- Use local linear ()
- At most local quadratic ()
- Avoid
Pitfall 2: Ignoring Clustering
Problem: If data has clustered structure (schools, regions), standard errors underestimated
Solution: Use clustered standard errors
result = rdrobust(y=df['Y'], x=df['X'], c=cutoff,
cluster=df['cluster_id'])Pitfall 3: Not Reporting Multiple Cutoffs
Problem: If multiple cutoffs exist (e.g., 40, 80, 120), reporting only one creates selection bias
Solution:
- Report results for all cutoffs
- Or pool them (pooled RDD)
Pitfall 4: Extrapolating Beyond the Cutoff
Problem: RDD estimates LATE at cutoff, not overall ATE
Lesson:
- Clearly state estimating local effect
- Don't over-extrapolate to entire distribution
Literature Recommendations
Essential Reviews
Lee, D. S., & Lemieux, T. (2010). "Regression Discontinuity Designs in Economics." Journal of Economic Literature, 48(2), 281-355.
- The bible of RDD, must-read!
Imbens, G. W., & Lemieux, T. (2008). "Regression discontinuity designs: A guide to practice." Journal of Econometrics, 142(2), 615-635.
Methodological Breakthroughs
Calonico, S., Cattaneo, M. D., & Titiunik, R. (2014). "Robust Nonparametric Confidence Intervals for Regression-Discontinuity Designs." Econometrica, 82(6), 2295-2326.
- CCT robust inference
Gelman, A., & Imbens, G. (2019). "Why High-Order Polynomials Should Not Be Used in Regression Discontinuity Designs." JBES, 37(3), 447-456.
- Warning against high-order polynomials
Classic Applications
Thistlethwaite, D. L., & Campbell, D. T. (1960). "Regression-discontinuity analysis." J. Educational Psychology, 51(6), 309.
- Birth of RDD
Angrist, J. D., & Lavy, V. (1999). "Using Maimonides' Rule to Estimate the Effect of Class Size on Scholastic Achievement." QJE, 114(2), 533-575.
Lee, D. S. (2008). "Randomized Experiments from Non-random Selection in U.S. House Elections." J. Econometrics, 142(2), 675-697.
Carpenter, C., & Dobkin, C. (2009). "The Effect of Alcohol Consumption on Mortality: Regression Discontinuity Evidence from the Minimum Drinking Age." AEJ: Applied, 1(1), 164-182.
Recommended Textbooks
- Angrist & Pischke (2009). Mostly Harmless Econometrics, Chapter 6
- Cunningham (2021). Causal Inference: The Mixtape, Chapter 6
- Huntington-Klein (2022). The Effect, Chapter 20
Python Toolkit
Core Packages
# Installation
pip install rdrobust rddensity numpy pandas matplotlib seaborn statsmodels
# Import
from rdrobust import rdrobust, rdplot, rdbwselect
from rddensity import rddensity
import statsmodels.formula.api as smf
import numpy as np
import pandas as pd
import matplotlib.pyplot as pltQuick Template
# 1. Basic RDD
result = rdrobust(y=Y, x=X, c=cutoff)
print(result)
# 2. Plotting
rdplot(y=Y, x=X, c=cutoff)
# 3. Fuzzy RDD
result_fuzzy = rdrobust(y=Y, x=X, c=cutoff, fuzzy=D)
# 4. Bandwidth selection
bw = rdbwselect(y=Y, x=X, c=cutoff)
print(f"Optimal bandwidth: {bw.bws[0]}")
# 5. Density test
density_test = rddensity(X=X, c=cutoff)
print(f"p-value: {density_test.pval[0]}")Practice Exercises
Exercise 1: Conceptual Understanding
- Explain: Why is RDD called a "local RCT"?
- Distinguish: What is the essential difference between Sharp RDD and Fuzzy RDD?
- Assumption: Why can't the continuity assumption be directly tested? What can we do?
Exercise 2: Data Analysis
Data generation:
np.random.seed(123)
n = 1000
X = np.random.uniform(-50, 50, n)
D = (X >= 0).astype(int)
Y = 50 + 0.5*X + 0.01*X**2 + 10*D + np.random.normal(0, 5, n)Tasks:
- Draw RDD plot
- Estimate treatment effect (using rdrobust)
- Conduct covariate balance tests (generate your own covariates)
- Conduct density test
- Conduct bandwidth sensitivity analysis
Exercise 3: Case Replication
Choose one of the following classic papers and attempt to replicate main results:
- Angrist & Lavy (1999)
- Lee (2008)
- Carpenter & Dobkin (2009)
Hint: Many papers' data available on author websites or journal websites.
From RDD to Broader Causal Inference
RDD's Position in the Causal Inference Toolkit
| Method | Data Requirement | Core Assumption | External Validity | Typical Applications |
|---|---|---|---|---|
| RCT | Experimental data | Random assignment | May be lower | Medicine, development economics |
| RDD | Cutoff rule | Continuity assumption | Local effect | Education, politics, public health |
| DID | Panel data | Parallel trends | May be broader | Policy evaluation |
| IV | Instrumental variable | Exclusion, relevance | Compliers | Returns to education, health |
When to Use RDD?
Ideal scenarios:
- Clear cutoff rule exists (policy, law, institution)
- Cutoff is exogenous (non-manipulable)
- Sufficient observations near cutoff
- Care about local effect (at cutoff)
Not suitable scenarios:
- Cutoff fuzzy or subjective
- Severe manipulation (election fraud, exam cheating)
- Too few observations near cutoff
- Need overall ATE
Next Steps in Learning
Advanced Topics
- Multidimensional RDD: Multiple running variables
- Dynamic RDD: Treatment effects varying over time
- Kink RDD: Slope change at cutoff (not level jump)
- Geographic RDD: Geographic boundaries as cutoffs
- Machine Learning + RDD: Using ML to estimate conditional expectation functions
Related Methods
- Synthetic Control: Constructing counterfactuals
- Matching: Finding comparable control groups
- Event Study: Dynamic effects
Final Words
Josh Angrist's advice:
"Econometrics is about using the right tool for the job. RDD is a powerful tool when you have a cutoff, but it's not a panacea. Always think carefully about identification."
The essence of RDD:
- Simple yet powerful: Intuitive concept, credible identification
- Local yet precise: Sacrifice external validity for internal validity
- Transparent and testable: Assumptions can be verified indirectly
Remember:
- Seek clear cutoff rules
- Rigorously test continuity assumption
- Report robustness analyses
- Interpret carefully (LATE, not ATE)
Conclusion
Congratulations on completing Module 11! You have now mastered:
- Core ideas and identification logic of RDD
- Differences between Sharp and Fuzzy RDD
- Validity and robustness testing
- Replication of classic cases
- Python implementation best practices
You now possess:
- Ability to identify RDD research opportunities
- Skills to rigorously implement RDD analysis
- Foundation to publish RDD research in top journals
Next steps:
- Read more classic RDD papers
- Seek cutoff rules in your research
- Practice, practice, practice!
RDD: Local randomization, a powerful tool for causal inference!
Chapter Complete
Thank you for studying Module 11!
For questions or suggestions, please feel free to discuss.
Happy Coding & Happy Researching!