11.1 Chapter Introduction (Regression Discontinuity Design)
Local randomization: when nature creates quasi-experiments for us
Learning Objectives
After completing this chapter, you will be able to:
- Understand the core ideas of RDD and the principle of local randomization
- Master the differences and applications of Sharp RDD and Fuzzy RDD
- Implement RDD validity tests (continuity assumption, density test, covariate balance)
- Conduct bandwidth selection and robustness analysis
- Use Python to implement RDD analysis (rdrobust, statsmodels)
- Replicate classic RDD studies (Angrist & Lavy 1999, Lee 2008, etc.)
Why is RDD the "Most Credible" Quasi-Experimental Method?
Starting with the Counterfactual Idea
Josh Angrist's Perspective:
"RDD is the most credible quasi-experimental design, because it mimics a randomized experiment in a local neighborhood of the cutoff."
In causal inference, we care most about the counterfactual question:
- Observed outcome: A student's GPA after receiving a scholarship
- Counterfactual: What would this student's GPA be if they didn't receive the scholarship?
Problem: We can never observe both states simultaneously! (The fundamental problem of causal inference)
RCT's solution:
- Randomly assign treatment, ensuring treatment and control groups are completely comparable
- Average outcome in treatment group - Average outcome in control group = Average Treatment Effect (ATE)
RDD's clever approach: When we cannot conduct randomized experiments, if there exists a cutoff rule, individuals near the cutoff are almost "random"!
The Core Intuition of RDD
Scenario: College Scholarships and Student Performance
Suppose a university has the following rule:
- College entrance exam score ≥ 600 → Receive scholarship
- College entrance exam score < 600 → No scholarship
Research question: Does the scholarship improve students' college GPA?
Intuition:
- A student with 599 points vs a student with 600 points
- These two students are almost identical (ability, family background, study habits, etc.)
- The only difference: one just crossed the cutoff and received a scholarship
- Therefore, the difference in their GPAs can be attributed to the causal effect of the scholarship!
Illustration: An Ideal RDD
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Set font for Chinese characters
plt.rcParams['font.sans-serif'] = ['SimHei', 'DejaVu Sans']
plt.rcParams['axes.unicode_minus'] = False
sns.set_style("whitegrid")
# Set random seed
np.random.seed(42)
# Generate running variable
x = np.linspace(-50, 50, 1000)
cutoff = 0
# Generate outcome variable
# Left of cutoff (untreated)
y_left = 60 + 0.5 * x[x < cutoff] + np.random.normal(0, 3, sum(x < cutoff))
# Right of cutoff (treated): jump of 10 points
y_right = 70 + 0.5 * x[x >= cutoff] + np.random.normal(0, 3, sum(x >= cutoff))
# Fit polynomials (for drawing smooth curves)
from numpy.polynomial import Polynomial
p_left = Polynomial.fit(x[x < cutoff], y_left, deg=2)
p_right = Polynomial.fit(x[x >= cutoff], y_right, deg=2)
# Plot
fig, ax = plt.subplots(figsize=(14, 8))
# Scatter plot
ax.scatter(x[x < cutoff], y_left, alpha=0.4, s=20, color='blue', label='No Scholarship')
ax.scatter(x[x >= cutoff], y_right, alpha=0.4, s=20, color='red', label='Scholarship')
# Fitted curves
x_left_smooth = np.linspace(x.min(), cutoff, 100)
x_right_smooth = np.linspace(cutoff, x.max(), 100)
ax.plot(x_left_smooth, p_left(x_left_smooth), color='blue', linewidth=3, label='Left Fitted Line')
ax.plot(x_right_smooth, p_right(x_right_smooth), color='red', linewidth=3, label='Right Fitted Line')
# Mark cutoff
ax.axvline(x=cutoff, color='green', linestyle='--', linewidth=2.5, alpha=0.8)
ax.text(cutoff + 2, 45, 'Cutoff', fontsize=14, color='green',
fontweight='bold', ha='left')
# Annotate RDD effect
y_left_at_cutoff = p_left(cutoff)
y_right_at_cutoff = p_right(cutoff)
rdd_effect = y_right_at_cutoff - y_left_at_cutoff
ax.annotate('', xy=(cutoff + 0.5, y_right_at_cutoff),
xytext=(cutoff + 0.5, y_left_at_cutoff),
arrowprops=dict(arrowstyle='<->', color='purple', lw=3.5))
ax.text(cutoff + 3, (y_left_at_cutoff + y_right_at_cutoff) / 2,
f'RDD Effect\nτ = {rdd_effect:.1f}',
fontsize=13, color='purple', fontweight='bold',
bbox=dict(boxstyle='round', facecolor='yellow', alpha=0.4))
# Legend and labels
ax.set_xlabel('Running Variable (Exam Score - 600)', fontsize=14, fontweight='bold')
ax.set_ylabel('Outcome Variable (College GPA)', fontsize=14, fontweight='bold')
ax.set_title('Core Logic of Regression Discontinuity Design (RDD)', fontsize=16, fontweight='bold', pad=20)
ax.legend(loc='upper left', fontsize=12)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('rdd_illustration.png', dpi=300, bbox_inches='tight')
plt.show()Key observations:
- Left of cutoff: Outcome variable follows a smooth curve
- Right of cutoff: Outcome variable follows another smooth curve
- At the cutoff: A clear jump (discontinuity) appears
- RDD effect: The magnitude of the jump is the treatment effect!
Mathematical Expression of RDD
Potential Outcomes Framework
Notation:
- : Running Variable, e.g., exam score
- : Cutoff, e.g., 600 points
- : Treatment Status
- : Potential outcome (if untreated)
- : Potential outcome (if treated)
- : Observed outcome
Sharp RDD: Treatment Completely Determined by Cutoff
Definition: If the running variable crosses the cutoff , treatment status changes deterministically, we call it Sharp RDD.
Key assumption: Continuity Assumption
Assume that at the cutoff, the potential outcome functions are continuous:
In plain language: Without treatment, the outcome variable should be smooth at the cutoff (no jump).
Identification strategy:
Observed outcomes:
RDD Estimator:
Why is this a causal effect?
According to the continuity assumption:
Important: RDD identifies the average treatment effect at the cutoff, not the overall ATE!
RDD vs RCT: The Perspective of Local Randomization
RDD as a "Local RCT"
Josh Angrist's perspective:
"RDD can be thought of as a local randomized experiment. Near the cutoff, treatment assignment is 'as-if random'."
Intuition:
- Far from cutoff: High-scoring and low-scoring students differ greatly (ability, family background, etc.)
- Close to cutoff: Students with 599 and 600 points are almost identical
- At the cutoff: Treatment assignment is almost random (who gets exactly 600 has a luck component)
Formalization:
Within a small neighborhood of the cutoff, assume:
This is similar to balance in RCT: treatment and control groups are similar on all covariates.
RDD vs DID: When to Use Which Method?
| Feature | RDD | DID |
|---|---|---|
| Data requirement | Cross-section or single-period panel | Multi-period panel (at least 2 periods) |
| Identification source | Jump at cutoff | Double difference in time and group |
| Core assumption | Continuity assumption | Parallel trends assumption |
| External validity | Local effect (at cutoff) | May be broader |
| Internal validity | Very high (close to RCT) | Depends on parallel trends |
| Classic cases | Scholarships, elections | Minimum wage, environmental policy |
Rule of thumb:
- If there's a clear cutoff rule → Use RDD
- If there's spatial-temporal variation in policy → Use DID
- If you can conduct random assignment → Do an RCT directly!
️ Empirical Implementation of Sharp RDD
Linear Regression Approach
The simplest RDD estimation: fit two linear regression lines near the cutoff.
Model:
Parameter interpretation:
- : RDD effect (jump at cutoff) ⭐
- : Slope left of cutoff
- : Additional slope right of cutoff (total slope = )
Key: Center the running variable (), so that is the effect at the cutoff.
Polynomial Approach
Allow the relationship between outcome and running variable to be nonlinear:
Warning: High-order polynomials () are prone to overfitting! (Gelman & Imbens 2019)
Local Linear Regression
Modern best practice (Calonico, Cattaneo, Titiunik 2014):
- Choose bandwidth : Use only observations with
- Kernel weighting: Closer to cutoff gets higher weight
- Fit local linear regression:
Advantages:
- Optimal bias-variance tradeoff
- Minimal functional form assumptions
- Modern packages (like
rdrobust) implement automatically
Python Implementation: Simple Example
Simulate Sharp RDD Data
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.formula.api as smf
from scipy import stats
# Setup
np.random.seed(123)
n = 1000
cutoff = 0
# Generate running variable
X = np.random.uniform(-50, 50, n)
# Generate treatment status
D = (X >= cutoff).astype(int)
# Generate outcome variable
# True DGP: Y = 50 + 0.5*X + 10*D + noise
# This means treatment effect = 10
true_effect = 10
Y = 50 + 0.5 * X + true_effect * D + np.random.normal(0, 5, n)
# Create dataframe
df = pd.DataFrame({
'X': X,
'D': D,
'Y': Y,
'X_centered': X - cutoff
})
print("=" * 70)
print("Sharp RDD Simulated Data")
print("=" * 70)
print(f"Sample size: {n}")
print(f"Cutoff: {cutoff}")
print(f"True treatment effect: {true_effect}")
print(f"Treated units: {D.sum()} ({D.sum()/n*100:.1f}%)")
print("\nData preview:")
print(df.head(10))Visualization: Scatter Plot + Fitted Lines
# Fit separately by group
df_left = df[df['D'] == 0]
df_right = df[df['D'] == 1]
# OLS fit
from sklearn.linear_model import LinearRegression
lr_left = LinearRegression().fit(df_left[['X_centered']], df_left['Y'])
lr_right = LinearRegression().fit(df_right[['X_centered']], df_right['Y'])
# Predict
X_left_range = np.linspace(df_left['X_centered'].min(), 0, 100).reshape(-1, 1)
X_right_range = np.linspace(0, df_right['X_centered'].max(), 100).reshape(-1, 1)
Y_left_pred = lr_left.predict(X_left_range)
Y_right_pred = lr_right.predict(X_right_range)
# Plot
fig, ax = plt.subplots(figsize=(14, 8))
# Scatter plot (using binning to reduce visual clutter)
bins = 20
df['X_bin'] = pd.cut(df['X_centered'], bins=bins)
df_binned = df.groupby(['X_bin', 'D']).agg({'Y': 'mean', 'X_centered': 'mean'}).reset_index()
df_binned_left = df_binned[df_binned['D'] == 0]
df_binned_right = df_binned[df_binned['D'] == 1]
ax.scatter(df_binned_left['X_centered'], df_binned_left['Y'],
s=100, alpha=0.6, color='blue', edgecolors='black', linewidths=1.5,
label='Untreated (binned means)')
ax.scatter(df_binned_right['X_centered'], df_binned_right['Y'],
s=100, alpha=0.6, color='red', edgecolors='black', linewidths=1.5,
label='Treated (binned means)')
# Fitted lines
ax.plot(X_left_range, Y_left_pred, color='blue', linewidth=3, label='Left Fitted Line')
ax.plot(X_right_range, Y_right_pred, color='red', linewidth=3, label='Right Fitted Line')
# Cutoff
ax.axvline(x=0, color='green', linestyle='--', linewidth=2.5, alpha=0.7)
# Annotate effect
y_left_at_cutoff = lr_left.predict([[0]])[0]
y_right_at_cutoff = lr_right.predict([[0]])[0]
estimated_effect = y_right_at_cutoff - y_left_at_cutoff
ax.annotate('', xy=(0.5, y_right_at_cutoff), xytext=(0.5, y_left_at_cutoff),
arrowprops=dict(arrowstyle='<->', color='purple', lw=3))
ax.text(1, (y_left_at_cutoff + y_right_at_cutoff) / 2,
f'Estimated Effect\n= {estimated_effect:.2f}',
fontsize=12, color='purple', fontweight='bold',
bbox=dict(boxstyle='round', facecolor='yellow', alpha=0.3))
ax.set_xlabel('X - Cutoff', fontsize=13, fontweight='bold')
ax.set_ylabel('Y', fontsize=13, fontweight='bold')
ax.set_title(f'Sharp RDD Example (True Effect = {true_effect})',
fontsize=15, fontweight='bold')
ax.legend(loc='upper left', fontsize=11)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()Regression Estimation
# Method 1: Full sample linear RDD
model1 = smf.ols('Y ~ D + X_centered + D:X_centered', data=df).fit()
print("\n" + "=" * 70)
print("Method 1: Full Sample Linear RDD")
print("=" * 70)
print(model1.summary().tables[1])
print(f"\nEstimated RDD effect: {model1.params['D']:.3f}")
print(f"Standard error: {model1.bse['D']:.3f}")
print(f"95% Confidence interval: [{model1.conf_int().loc['D', 0]:.3f}, {model1.conf_int().loc['D', 1]:.3f}]")
# Method 2: Bandwidth restriction (use only observations near cutoff)
bandwidth = 20
df_local = df[np.abs(df['X_centered']) <= bandwidth].copy()
model2 = smf.ols('Y ~ D + X_centered + D:X_centered', data=df_local).fit()
print("\n" + "=" * 70)
print(f"Method 2: Local Linear RDD (bandwidth = {bandwidth})")
print("=" * 70)
print(f"Observations used: {len(df_local)} / {len(df)} ({len(df_local)/len(df)*100:.1f}%)")
print(model2.summary().tables[1])
print(f"\nEstimated RDD effect: {model2.params['D']:.3f}")
print(f"Standard error: {model2.bse['D']:.3f}")
# Comparison
print("\n" + "=" * 70)
print("Effect Estimate Comparison")
print("=" * 70)
print(f"True effect: {true_effect:.3f}")
print(f"Full sample estimate: {model1.params['D']:.3f} (SE = {model1.bse['D']:.3f})")
print(f"Local estimate (h={bandwidth}): {model2.params['D']:.3f} (SE = {model2.bse['D']:.3f})")Output interpretation:
- Both methods should be close to the true effect of 10
- Local estimate typically has larger standard errors (smaller sample size)
- But local estimate has smaller bias (weaker functional form assumptions)
Fuzzy RDD: Imperfect Cutoffs
What is Fuzzy RDD?
In reality, the cutoff rule may be imperfect:
- Sharp RDD: ,
- Fuzzy RDD: , but not 0 or 1
Examples:
- College admission: Cutoff at 600, but special cases (sports, minority bonuses, etc.)
- Medicare: Auto-enrollment at age 65, but some purchase early
Fuzzy RDD Identification
Idea: Use the cutoff as an instrumental variable (IV)!
Two-stage regression:
First stage: Use cutoff to predict treatment status
Second stage: Use predicted treatment to estimate effect
Fuzzy RDD estimator:
Interpretation:
- Numerator: Jump in outcome at cutoff (Reduced Form)
- Denominator: Jump in treatment at cutoff (First Stage)
- Ratio: Local average treatment effect (LATE)
Connection to IV: Fuzzy RDD is essentially IV estimation, where the cutoff () serves as the instrument!
Preview of Classic RDD Applications
Case 1: Thistlethwaite & Campbell (1960) - Birth of RDD
Research question: Does receiving National Merit Award affect future scholarship attainment?
Design:
- Cutoff: Threshold on national exam score
- Treatment: Receive merit award
- Outcome: Number of subsequent scholarships
Finding: Significant positive RDD effect
Historical significance: This was the first application of RDD (1960)!
Case 2: Angrist & Lavy (1999) - Class Size and Student Achievement
Research question: Does reducing class size improve student performance?
Background:
- Israel has a rule (Maimonides' Rule): Class size cannot exceed 40 students
- If school has 41 students → Must split into 2 classes (~20 each)
- If school has 40 students → 1 class (40 students)
Design:
- Running variable: Total enrollment in school
- Cutoffs: 40, 80, 120, ... (multiples of 40)
- Treatment: Class size (determined by rule)
- Outcome: Standardized test scores
Finding: Reducing class size by 1 student → Scores increase by 0.1-0.2 standard deviations
Innovation: Classic application of Fuzzy RDD (rule not perfectly enforced)
Case 3: Lee (2008) - Electoral Advantage and Re-election
Research question: Does incumbency status confer re-election advantage?
Design:
- Cutoff: Vote share = 50%
- Treatment: Become incumbent
- Outcome: Vote share in next election
Key intuition:
- Candidate with 49.9% vs candidate with 50.1%
- These two are almost identical (political strength, funding, voter support, etc.)
- Only difference: One wins, one loses
Finding: Huge incumbency advantage (about 40 percentage points)!
Core Assumptions of RDD
Assumption 1: Continuity Assumption ⭐
Assumption: At the cutoff, all factors except treatment status are continuous.
Mathematical expression:
In plain language: Without treatment, outcome variable doesn't jump at cutoff.
How to test? (Section 3 discusses in detail)
- Covariate balance tests: Check if covariates (age, gender, etc.) are balanced on both sides of cutoff
- Density test (McCrary Test): Check if density of running variable is smooth at cutoff
- Placebo tests: Test using false cutoffs
Assumption 2: No Precise Manipulation
Assumption: Individuals cannot precisely manipulate the running variable to just cross the cutoff.
Threats:
- Exam cheating: Students know 600 is cutoff, cheat to get exactly 600
- Election fraud: Candidates manipulate votes to get just over 50%
- Policy lobbying: Firms lobby government to stay just below regulatory threshold
How to test?
- McCrary density test: Check for abnormal bunching at cutoff
Assumption 3: Local Exclusion
Assumption: Running variable affects outcome only through treatment (near cutoff).
Threat:
- If exam score itself (beyond scholarship) directly affects GPA (e.g., confidence), RDD will be biased
Rule of thumb: Choose "exogenous" running variables (birth date, lottery number)
Chapter Structure
Section 1: Chapter Introduction (Current)
- Core ideas of RDD and counterfactual framework
- Sharp RDD vs Fuzzy RDD
- Comparison with RCT and DID
- Python basic implementation
Section 2: RDD Fundamentals and Identification
- Mathematical derivation of Sharp RDD
- Fuzzy RDD and instrumental variables
- Local average treatment effect (LATE)
- Linear vs nonparametric methods
Section 3: Continuity Assumption and Validity Tests
- Testing the continuity assumption
- Covariate balance tests
- Density test (McCrary Test)
- Placebo tests
Section 4: Bandwidth Selection and Robustness Tests
- Optimal bandwidth selection (IK, CCT)
- Sensitivity analysis
- Polynomial order selection
- Donut-hole RDD
Section 5: Classic Cases and Python Implementation
- Angrist & Lavy (1999) Class size
- Lee (2008) Electoral advantage
- Carpenter & Dobkin (2009) Minimum drinking age
- Best practices using rdrobust package
Section 6: Chapter Summary
- RDD methodology summary
- Common pitfalls and best practices
- Practice exercises
- Literature recommendations
️ Python Toolkit
Core Libraries
| Package | Main Functions | Installation |
|---|---|---|
| pandas | Data manipulation | pip install pandas |
| numpy | Numerical computation | pip install numpy |
| statsmodels | OLS regression | pip install statsmodels |
| rdrobust | RDD optimal bandwidth and robust inference | pip install rdrobust |
| rddtools | RDD toolkit | (Install from source) |
| matplotlib | Visualization | pip install matplotlib |
| seaborn | Advanced visualization | pip install seaborn |
Basic Setup
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
import statsmodels.formula.api as smf
from scipy import stats
# Font settings (choose based on OS)
plt.rcParams['font.sans-serif'] = ['SimHei', 'Arial Unicode MS', 'DejaVu Sans']
plt.rcParams['axes.unicode_minus'] = False
# Set style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 7)
pd.set_option('display.float_format', '{:.4f}'.format)rdrobust Package Installation
# Python version
pip install rdrobust
# Or using conda
conda install -c conda-forge rdrobustUsage example:
from rdrobust import rdrobust, rdbwselect, rdplot
# Automatic bandwidth selection and robust inference
result = rdrobust(y=Y, x=X, c=cutoff)
print(result)
# Plot RDD
rdplot(y=Y, x=X, c=cutoff, nbins=20)Essential Reading
Foundational Papers
Thistlethwaite, D. L., & Campbell, D. T. (1960). "Regression-discontinuity analysis: An alternative to the ex post facto experiment." Journal of Educational Psychology, 51(6), 309.
- Birth of RDD method
Hahn, J., Todd, P., & Van der Klaauw, W. (2001). "Identification and Estimation of Treatment Effects with a Regression-Discontinuity Design." Econometrica, 69(1), 201-209.
- Modern identification theory for RDD
Lee, D. S., & Lemieux, T. (2010). "Regression Discontinuity Designs in Economics." Journal of Economic Literature, 48(2), 281-355.
- Must-read review, the bible of RDD
Methodological Breakthroughs
Imbens, G., & Kalyanaraman, K. (2012). "Optimal Bandwidth Choice for the Regression Discontinuity Estimator." Review of Economic Studies, 79(3), 933-959.
- Optimal bandwidth selection (IK method)
Calonico, S., Cattaneo, M. D., & Titiunik, R. (2014). "Robust Nonparametric Confidence Intervals for Regression-Discontinuity Designs." Econometrica, 82(6), 2295-2326.
- Robust inference (CCT method)
Gelman, A., & Imbens, G. (2019). "Why High-Order Polynomials Should Not Be Used in Regression Discontinuity Designs." Journal of Business & Economic Statistics, 37(3), 447-456.
- Warning: Don't use high-order polynomials!
Classic Applications
Angrist, J. D., & Lavy, V. (1999). "Using Maimonides' Rule to Estimate the Effect of Class Size on Scholastic Achievement." Quarterly Journal of Economics, 114(2), 533-575.
Lee, D. S. (2008). "Randomized Experiments from Non-random Selection in U.S. House Elections." Journal of Econometrics, 142(2), 675-697.
Recommended Textbooks
- Angrist & Pischke (2009). Mostly Harmless Econometrics, Chapter 6
- Cunningham (2021). Causal Inference: The Mixtape, Chapter 6
- Huntington-Klein (2022). The Effect, Chapter 20
Ready to Begin?
RDD is the quasi-experimental method closest to randomized experiments. Master it, and you'll be able to:
- Identify causal effects in the absence of randomized experiments
- Leverage policy rules and natural cutoffs for research
- Publish high-quality causal inference studies
Remember the core idea:
"In the neighborhood of the cutoff, RDD is as good as a randomized experiment. The discontinuity is your friend." — Joshua Angrist
Let's dive into Section 2: RDD Fundamentals and Identification!
Local randomization: a powerful tool for causal inference!