11.2 RDD Fundamentals and Identification
"Regression discontinuity is one of the most credible quasi-experimental strategies."— Guido Imbens & Thomas Lemieux, RDD Review Authors
From potential outcomes framework to local randomization: the theoretical foundations of RDD
Section Overview
In this section, we will explore in depth:
- Rigorous mathematical derivation of Sharp RDD
- Relationship between Fuzzy RDD and instrumental variables (IV)
- Meaning of local average treatment effect (LATE)
- Parametric vs nonparametric estimation methods
- Bandwidth selection tradeoffs
Identification Theory of Sharp RDD
Reviewing the Potential Outcomes Framework
In the Rubin Causal Model, each individual has two potential outcomes:
- : If treated
- : If untreated
Individual treatment effect:
Fundamental problem: We can never observe both and simultaneously!
Observed outcome:
where is the treatment indicator.
RDD Identification Logic
Core idea: Near the cutoff , treatment assignment is "quasi-random".
Sharp RDD assignment rule:
Key assumption: Continuity Assumption
For , assume:
Intuition:
- At cutoff , the potential outcome function is continuous
- In other words, small changes in running variable don't cause jumps in potential outcomes
Deriving the Identification Strategy
Observed conditional expectation:
Right of cutoff (, everyone treated):
Left of cutoff (, everyone untreated):
RDD estimator:
Why is this a causal effect?
According to the continuity assumption:
Key points:
- RDD estimates the treatment effect at the cutoff, not overall ATE
- This is a local effect (Local Average Treatment Effect, LATE)
- Extrapolation to other values requires additional assumptions (e.g., effect homogeneity)
Fuzzy RDD: Imperfect Cutoff Rules
Motivation for Fuzzy RDD
In reality, treatment assignment rules may be imperfect:
Example 1: College admission
- Rule: Exam score ≥ 600 → Admitted
- Reality: Special admissions (sports, arts, principal recommendations, etc.)
- Result: ,
Example 2: Medicare
- Rule: Age ≥ 65 → Automatically enrolled in Medicare
- Reality: Some purchase early, some choose not to participate
- Result: Treatment jumps at cutoff but not from 0 to 1
Definition of Fuzzy RDD
Sharp RDD:
Fuzzy RDD:
But not necessarily perfect 0 and 1.
Key condition:
That is: Treatment probability jumps at the cutoff.
Fuzzy RDD Identification: Instrumental Variables Approach
Core idea: Use the cutoff indicator as an instrumental variable (IV)!
Define instrument:
Three key IV conditions:
Relevance: is correlated with treatment
Exclusion: affects only through (after controlling for )
Monotonicity: Crossing cutoff doesn't make anyone switch from treated to untreated
Fuzzy RDD estimator:
Reduced Form:
First Stage:
Fuzzy RDD effect (Wald estimator):
Causal Interpretation of Fuzzy RDD
Local Average Treatment Effect (LATE):
Fuzzy RDD estimates the treatment effect for compliers.
Four types of individuals (based on potential treatment states):
- Always-takers: (treated regardless of cutoff)
- Never-takers: (untreated regardless of cutoff)
- Compliers: (treated only if cross cutoff)
- Defiers: (untreated if cross cutoff)
Monotonicity assumption rules out the existence of Defiers.
LATE theorem (Imbens & Angrist 1994):
Interpretation:
- Fuzzy RDD estimates the average treatment effect for compliers (those who change treatment status by crossing cutoff)
- For Always-takers and Never-takers, we cannot identify their treatment effects
- If Compliers differ from population, LATE ≠ ATE
Parametric vs Nonparametric Estimation Methods
Method 1: Global Polynomial Regression
Model (-order polynomial):
Advantages:
- Simple and intuitive
- Uses all data, high efficiency
Disadvantages:
- Sensitive to functional form assumptions (choosing is critical)
- Data far from cutoff can bias estimates
- Gelman & Imbens (2019) warn: Do not use polynomials!
Python implementation:
import statsmodels.formula.api as smf
# Second-order polynomial
model = smf.ols('Y ~ D + X_c + I(X_c**2) + D:X_c + D:I(X_c**2)', data=df).fit()
print(model.summary())Method 2: Local Linear Regression (Recommended)
Approach: Use only data within of cutoff, fit linear model.
Model:
Bandwidth :
- Too small: High variance (less data)
- Too large: High bias (functional form assumptions may be wrong)
Advantages:
- Weakest functional form assumptions (only locally linear)
- Theoretically optimal (Fan & Gijbels 1996)
- Modern software (like
rdrobust) automatically selects optimal bandwidth
Python implementation:
# Manual bandwidth selection
h = 10
df_local = df[np.abs(df['X_c']) <= h]
model_local = smf.ols('Y ~ D + X_c + D:X_c', data=df_local).fit()
# Using rdrobust (automatic bandwidth)
from rdrobust import rdrobust
result = rdrobust(y=df['Y'], x=df['X'], c=cutoff)
print(result)Method 3: Kernel-Weighted Local Linear Regression
Further improvement: Give higher weight to observations closer to cutoff.
Triangular kernel function:
Weight:
Weighted regression:
Python implementation:
from statsmodels.regression.linear_model import WLS
# Calculate weights
df['weight'] = np.maximum(0, 1 - np.abs(df['X_c']) / h)
# Weighted least squares
model_wls = smf.wls('Y ~ D + X_c + D:X_c', data=df, weights=df['weight']).fit()
print(model_wls.summary())️ Bandwidth Selection Tradeoffs
Bias-Variance Tradeoff
Small bandwidth :
- Advantage: Low bias (more accurate functional approximation)
- Disadvantage: High variance (small sample, unstable estimate)
Large bandwidth :
- Advantage: Low variance (large sample, stable estimate)
- Disadvantage: High bias (may violate local linearity assumption)
Mean squared error (MSE):
Optimal bandwidth : Minimizes MSE.
Optimal Bandwidth Selection Methods
1. Imbens-Kalyanaraman (IK) Method (2012)
Approach: Based on asymptotic expansion of MSE, derive optimal bandwidth.
Formula (simplified):
where:
- : Residual variance
- : Density of running variable at cutoff
- : Second derivative of left potential outcome function
Python implementation:
from rdrobust import rdbwselect
# Automatic IK bandwidth selection
bw = rdbwselect(y=df['Y'], x=df['X'], c=cutoff, bwselect='mserd')
print(f"IK Bandwidth: {bw.bws[0]}")2. Calonico-Cattaneo-Titiunik (CCT) Method (2014)
Improvements:
- Considers finite-sample bias correction
- Provides robust confidence intervals
Two bandwidths:
- Main bandwidth : For point estimation
- Bias bandwidth : For estimating and correcting bias
Python implementation (rdrobust uses CCT by default):
from rdrobust import rdrobust
# CCT method (default)
result_cct = rdrobust(y=df['Y'], x=df['X'], c=cutoff)
print(result_cct)Cross-Validation
Leave-one-out cross-validation:
- For each observation , remove it
- Fit model with remaining data (using bandwidth )
- Predict
- Calculate prediction error:
- Repeat for all observations, choose minimizing
Note: Use only data on one side of cutoff for cross-validation (avoid using jump itself).
Complete Python Example: Sharp vs Fuzzy RDD
Example 1: Sharp RDD
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf
from rdrobust import rdrobust, rdplot
# Setup
np.random.seed(42)
n = 2000
c = 0
# Generate running variable
X = np.random.normal(0, 10, n)
# Sharp RDD: treatment completely determined by cutoff
D = (X >= c).astype(int)
# Generate outcome variable
# DGP: Y = 50 + 0.5*X + 0.01*X^2 + 10*D + noise
true_effect = 10
Y = 50 + 0.5 * X + 0.01 * X**2 + true_effect * D + np.random.normal(0, 5, n)
df = pd.DataFrame({'X': X, 'D': D, 'Y': Y, 'X_c': X - c})
# Estimate (using rdrobust)
result_sharp = rdrobust(y=df['Y'], x=df['X'], c=c)
print("=" * 70)
print("Sharp RDD Results")
print("=" * 70)
print(result_sharp)
# Visualization
rdplot(y=df['Y'], x=df['X'], c=c,
title='Sharp RDD',
x_label='Running Variable',
y_label='Outcome')
plt.show()Example 2: Fuzzy RDD
# Fuzzy RDD: imperfect treatment
# Crossing cutoff, treatment probability jumps from 0.2 to 0.8
np.random.seed(42)
# Potential treatment state
prob_treat = 0.2 + 0.6 * (X >= c) # 20% left, 80% right
D_fuzzy = np.random.binomial(1, prob_treat)
# Generate outcome variable (true effect still 10)
Y_fuzzy = 50 + 0.5 * X + 0.01 * X**2 + true_effect * D_fuzzy + np.random.normal(0, 5, n)
df_fuzzy = pd.DataFrame({'X': X, 'D': D_fuzzy, 'Y': Y_fuzzy, 'X_c': X - c})
# Fuzzy RDD estimation (automatically detects and uses IV)
result_fuzzy = rdrobust(y=df_fuzzy['Y'], x=df_fuzzy['X'], c=c, fuzzy=df_fuzzy['D'])
print("\n" + "=" * 70)
print("Fuzzy RDD Results")
print("=" * 70)
print(result_fuzzy)
# Check first stage
print("\nFirst Stage Check:")
first_stage = rdrobust(y=df_fuzzy['D'], x=df_fuzzy['X'], c=c)
print(f"Jump in treatment probability: {first_stage.coef[0]:.3f}")
print(f"F-statistic: {first_stage.z[0]**2:.2f}")Example 3: Sensitivity Analysis with Different Bandwidths
# Try different bandwidths
bandwidths = [5, 10, 15, 20, 25]
results = []
for h in bandwidths:
df_local = df[np.abs(df['X_c']) <= h]
model = smf.ols('Y ~ D + X_c + D:X_c', data=df_local).fit()
results.append({
'bandwidth': h,
'effect': model.params['D'],
'se': model.bse['D'],
'n': len(df_local)
})
results_df = pd.DataFrame(results)
print("\n" + "=" * 70)
print("Bandwidth Sensitivity Analysis")
print("=" * 70)
print(results_df.to_string(index=False))
# Visualization
fig, ax = plt.subplots(figsize=(12, 6))
ax.errorbar(results_df['bandwidth'], results_df['effect'],
yerr=1.96 * results_df['se'],
fmt='o-', capsize=5, capthick=2, linewidth=2, markersize=8)
ax.axhline(y=true_effect, color='red', linestyle='--', linewidth=2,
label=f'True Effect = {true_effect}')
ax.set_xlabel('Bandwidth', fontsize=13, fontweight='bold')
ax.set_ylabel('Estimated RDD Effect', fontsize=13, fontweight='bold')
ax.set_title('Sensitivity to Bandwidth Choice', fontsize=15, fontweight='bold')
ax.legend(fontsize=11)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()Statistical Inference for RDD
Computing Standard Errors
Heteroskedasticity-robust standard errors (HC1/HC2):
model = smf.ols('Y ~ D + X_c + D:X_c', data=df_local).fit(cov_type='HC2')
print(model.summary())Clustered standard errors (if clustered structure exists):
# Assuming data clustered at school level
model_cluster = smf.ols('Y ~ D + X_c + D:X_c', data=df_local).fit(
cov_type='cluster', cov_kwds={'groups': df_local['school_id']}
)Constructing Confidence Intervals
Conventional confidence interval (based on asymptotic normality):
Robust confidence interval (CCT method, considers finite-sample bias):
from rdrobust import rdrobust
# CCT robust confidence interval
result = rdrobust(y=df['Y'], x=df['X'], c=c)
print(f"Point estimate: {result.coef[0]:.3f}")
print(f"Robust 95% CI: [{result.ci[0][0]:.3f}, {result.ci[0][1]:.3f}]")Bootstrap confidence interval:
from scipy.stats import bootstrap
def rdd_estimator(data, indices):
"""RDD estimator (for bootstrap)"""
df_boot = data.iloc[indices]
df_boot_local = df_boot[np.abs(df_boot['X_c']) <= h]
model = smf.ols('Y ~ D + X_c + D:X_c', data=df_boot_local).fit()
return model.params['D']
# Bootstrap (1000 resamples)
n_boot = 1000
boot_estimates = []
for _ in range(n_boot):
indices = np.random.choice(len(df), len(df), replace=True)
boot_estimates.append(rdd_estimator(df, indices))
boot_estimates = np.array(boot_estimates)
ci_lower = np.percentile(boot_estimates, 2.5)
ci_upper = np.percentile(boot_estimates, 97.5)
print(f"Bootstrap 95% CI: [{ci_lower:.3f}, {ci_upper:.3f}]")Key Takeaways
Sharp RDD
- Identification condition: Continuity assumption (potential outcomes continuous at cutoff)
- Estimator: Jump in observed outcome at cutoff
- Causal interpretation: Local average treatment effect at cutoff (LATE)
- Best practice: Use local linear regression + automatic bandwidth selection (CCT)
Fuzzy RDD
- Essence: IV estimation, cutoff as instrument
- Identification conditions: Continuity + IV assumptions (relevance, exclusion, monotonicity)
- Estimator: Wald estimator (outcome jump / treatment jump)
- Causal interpretation: LATE for compliers
- Test: First stage must be strong (F > 10)
Bandwidth Selection
- Tradeoff: Bias (small bandwidth) vs variance (large bandwidth)
- Optimal method: IK or CCT (automatic data-driven)
- Robustness: Report results under multiple bandwidths
Section Summary
In this section, we learned:
- Rigorous mathematical derivation of Sharp RDD (from potential outcomes framework)
- Deep connection between Fuzzy RDD and instrumental variables
- Meaning and limitations of local average treatment effect (LATE)
- Tradeoffs between parametric (polynomial) vs nonparametric (local linear) methods
- Theory and practice of bandwidth selection (IK, CCT)
- Complete Python implementation and robust inference
Next step: In Section 3, we will learn how to test RDD's core assumptions, including continuity assumption, density tests (McCrary Test), and covariate balance.
Solid theory ensures credible empirics!