Skip to content

11.1 Chapter Introduction (Regression Discontinuity Design)

Local randomization: when nature creates quasi-experiments for us

DifficultyImportance


Learning Objectives

After completing this chapter, you will be able to:

  • Understand the core ideas of RDD and the principle of local randomization
  • Master the differences and applications of Sharp RDD and Fuzzy RDD
  • Implement RDD validity tests (continuity assumption, density test, covariate balance)
  • Conduct bandwidth selection and robustness analysis
  • Use Python to implement RDD analysis (rdrobust, statsmodels)
  • Replicate classic RDD studies (Angrist & Lavy 1999, Lee 2008, etc.)

Why is RDD the "Most Credible" Quasi-Experimental Method?

Starting with the Counterfactual Idea

Josh Angrist's Perspective:

"RDD is the most credible quasi-experimental design, because it mimics a randomized experiment in a local neighborhood of the cutoff."

In causal inference, we care most about the counterfactual question:

  • Observed outcome: A student's GPA after receiving a scholarship
  • Counterfactual: What would this student's GPA be if they didn't receive the scholarship?

Problem: We can never observe both states simultaneously! (The fundamental problem of causal inference)

RCT's solution:

  • Randomly assign treatment, ensuring treatment and control groups are completely comparable
  • Average outcome in treatment group - Average outcome in control group = Average Treatment Effect (ATE)

RDD's clever approach: When we cannot conduct randomized experiments, if there exists a cutoff rule, individuals near the cutoff are almost "random"!


The Core Intuition of RDD

Scenario: College Scholarships and Student Performance

Suppose a university has the following rule:

  • College entrance exam score ≥ 600 → Receive scholarship
  • College entrance exam score < 600 → No scholarship

Research question: Does the scholarship improve students' college GPA?

Intuition:

  1. A student with 599 points vs a student with 600 points
  2. These two students are almost identical (ability, family background, study habits, etc.)
  3. The only difference: one just crossed the cutoff and received a scholarship
  4. Therefore, the difference in their GPAs can be attributed to the causal effect of the scholarship!

Illustration: An Ideal RDD

python
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set font for Chinese characters
plt.rcParams['font.sans-serif'] = ['SimHei', 'DejaVu Sans']
plt.rcParams['axes.unicode_minus'] = False
sns.set_style("whitegrid")

# Set random seed
np.random.seed(42)

# Generate running variable
x = np.linspace(-50, 50, 1000)
cutoff = 0

# Generate outcome variable
# Left of cutoff (untreated)
y_left = 60 + 0.5 * x[x < cutoff] + np.random.normal(0, 3, sum(x < cutoff))
# Right of cutoff (treated): jump of 10 points
y_right = 70 + 0.5 * x[x >= cutoff] + np.random.normal(0, 3, sum(x >= cutoff))

# Fit polynomials (for drawing smooth curves)
from numpy.polynomial import Polynomial
p_left = Polynomial.fit(x[x < cutoff], y_left, deg=2)
p_right = Polynomial.fit(x[x >= cutoff], y_right, deg=2)

# Plot
fig, ax = plt.subplots(figsize=(14, 8))

# Scatter plot
ax.scatter(x[x < cutoff], y_left, alpha=0.4, s=20, color='blue', label='No Scholarship')
ax.scatter(x[x >= cutoff], y_right, alpha=0.4, s=20, color='red', label='Scholarship')

# Fitted curves
x_left_smooth = np.linspace(x.min(), cutoff, 100)
x_right_smooth = np.linspace(cutoff, x.max(), 100)
ax.plot(x_left_smooth, p_left(x_left_smooth), color='blue', linewidth=3, label='Left Fitted Line')
ax.plot(x_right_smooth, p_right(x_right_smooth), color='red', linewidth=3, label='Right Fitted Line')

# Mark cutoff
ax.axvline(x=cutoff, color='green', linestyle='--', linewidth=2.5, alpha=0.8)
ax.text(cutoff + 2, 45, 'Cutoff', fontsize=14, color='green',
        fontweight='bold', ha='left')

# Annotate RDD effect
y_left_at_cutoff = p_left(cutoff)
y_right_at_cutoff = p_right(cutoff)
rdd_effect = y_right_at_cutoff - y_left_at_cutoff

ax.annotate('', xy=(cutoff + 0.5, y_right_at_cutoff),
            xytext=(cutoff + 0.5, y_left_at_cutoff),
            arrowprops=dict(arrowstyle='<->', color='purple', lw=3.5))
ax.text(cutoff + 3, (y_left_at_cutoff + y_right_at_cutoff) / 2,
        f'RDD Effect\nτ = {rdd_effect:.1f}',
        fontsize=13, color='purple', fontweight='bold',
        bbox=dict(boxstyle='round', facecolor='yellow', alpha=0.4))

# Legend and labels
ax.set_xlabel('Running Variable (Exam Score - 600)', fontsize=14, fontweight='bold')
ax.set_ylabel('Outcome Variable (College GPA)', fontsize=14, fontweight='bold')
ax.set_title('Core Logic of Regression Discontinuity Design (RDD)', fontsize=16, fontweight='bold', pad=20)
ax.legend(loc='upper left', fontsize=12)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('rdd_illustration.png', dpi=300, bbox_inches='tight')
plt.show()

Key observations:

  1. Left of cutoff: Outcome variable follows a smooth curve
  2. Right of cutoff: Outcome variable follows another smooth curve
  3. At the cutoff: A clear jump (discontinuity) appears
  4. RDD effect: The magnitude of the jump is the treatment effect!

Mathematical Expression of RDD

Potential Outcomes Framework

Notation:

  • : Running Variable, e.g., exam score
  • : Cutoff, e.g., 600 points
  • : Treatment Status
  • : Potential outcome (if untreated)
  • : Potential outcome (if treated)
  • : Observed outcome

Sharp RDD: Treatment Completely Determined by Cutoff

Definition: If the running variable crosses the cutoff , treatment status changes deterministically, we call it Sharp RDD.

Key assumption: Continuity Assumption

Assume that at the cutoff, the potential outcome functions are continuous:

In plain language: Without treatment, the outcome variable should be smooth at the cutoff (no jump).

Identification strategy:

Observed outcomes:

RDD Estimator:

Why is this a causal effect?

According to the continuity assumption:

Important: RDD identifies the average treatment effect at the cutoff, not the overall ATE!


RDD vs RCT: The Perspective of Local Randomization

RDD as a "Local RCT"

Josh Angrist's perspective:

"RDD can be thought of as a local randomized experiment. Near the cutoff, treatment assignment is 'as-if random'."

Intuition:

  1. Far from cutoff: High-scoring and low-scoring students differ greatly (ability, family background, etc.)
  2. Close to cutoff: Students with 599 and 600 points are almost identical
  3. At the cutoff: Treatment assignment is almost random (who gets exactly 600 has a luck component)

Formalization:

Within a small neighborhood of the cutoff, assume:

This is similar to balance in RCT: treatment and control groups are similar on all covariates.

RDD vs DID: When to Use Which Method?

FeatureRDDDID
Data requirementCross-section or single-period panelMulti-period panel (at least 2 periods)
Identification sourceJump at cutoffDouble difference in time and group
Core assumptionContinuity assumptionParallel trends assumption
External validityLocal effect (at cutoff)May be broader
Internal validityVery high (close to RCT)Depends on parallel trends
Classic casesScholarships, electionsMinimum wage, environmental policy

Rule of thumb:

  • If there's a clear cutoff rule → Use RDD
  • If there's spatial-temporal variation in policy → Use DID
  • If you can conduct random assignment → Do an RCT directly!

️ Empirical Implementation of Sharp RDD

Linear Regression Approach

The simplest RDD estimation: fit two linear regression lines near the cutoff.

Model:

Parameter interpretation:

  • : RDD effect (jump at cutoff) ⭐
  • : Slope left of cutoff
  • : Additional slope right of cutoff (total slope = )

Key: Center the running variable (), so that is the effect at the cutoff.

Polynomial Approach

Allow the relationship between outcome and running variable to be nonlinear:

Warning: High-order polynomials () are prone to overfitting! (Gelman & Imbens 2019)

Local Linear Regression

Modern best practice (Calonico, Cattaneo, Titiunik 2014):

  1. Choose bandwidth : Use only observations with
  2. Kernel weighting: Closer to cutoff gets higher weight
  3. Fit local linear regression:

Advantages:

  • Optimal bias-variance tradeoff
  • Minimal functional form assumptions
  • Modern packages (like rdrobust) implement automatically

Python Implementation: Simple Example

Simulate Sharp RDD Data

python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.formula.api as smf
from scipy import stats

# Setup
np.random.seed(123)
n = 1000
cutoff = 0

# Generate running variable
X = np.random.uniform(-50, 50, n)

# Generate treatment status
D = (X >= cutoff).astype(int)

# Generate outcome variable
# True DGP: Y = 50 + 0.5*X + 10*D + noise
# This means treatment effect = 10
true_effect = 10
Y = 50 + 0.5 * X + true_effect * D + np.random.normal(0, 5, n)

# Create dataframe
df = pd.DataFrame({
    'X': X,
    'D': D,
    'Y': Y,
    'X_centered': X - cutoff
})

print("=" * 70)
print("Sharp RDD Simulated Data")
print("=" * 70)
print(f"Sample size: {n}")
print(f"Cutoff: {cutoff}")
print(f"True treatment effect: {true_effect}")
print(f"Treated units: {D.sum()} ({D.sum()/n*100:.1f}%)")
print("\nData preview:")
print(df.head(10))

Visualization: Scatter Plot + Fitted Lines

python
# Fit separately by group
df_left = df[df['D'] == 0]
df_right = df[df['D'] == 1]

# OLS fit
from sklearn.linear_model import LinearRegression
lr_left = LinearRegression().fit(df_left[['X_centered']], df_left['Y'])
lr_right = LinearRegression().fit(df_right[['X_centered']], df_right['Y'])

# Predict
X_left_range = np.linspace(df_left['X_centered'].min(), 0, 100).reshape(-1, 1)
X_right_range = np.linspace(0, df_right['X_centered'].max(), 100).reshape(-1, 1)
Y_left_pred = lr_left.predict(X_left_range)
Y_right_pred = lr_right.predict(X_right_range)

# Plot
fig, ax = plt.subplots(figsize=(14, 8))

# Scatter plot (using binning to reduce visual clutter)
bins = 20
df['X_bin'] = pd.cut(df['X_centered'], bins=bins)
df_binned = df.groupby(['X_bin', 'D']).agg({'Y': 'mean', 'X_centered': 'mean'}).reset_index()

df_binned_left = df_binned[df_binned['D'] == 0]
df_binned_right = df_binned[df_binned['D'] == 1]

ax.scatter(df_binned_left['X_centered'], df_binned_left['Y'],
           s=100, alpha=0.6, color='blue', edgecolors='black', linewidths=1.5,
           label='Untreated (binned means)')
ax.scatter(df_binned_right['X_centered'], df_binned_right['Y'],
           s=100, alpha=0.6, color='red', edgecolors='black', linewidths=1.5,
           label='Treated (binned means)')

# Fitted lines
ax.plot(X_left_range, Y_left_pred, color='blue', linewidth=3, label='Left Fitted Line')
ax.plot(X_right_range, Y_right_pred, color='red', linewidth=3, label='Right Fitted Line')

# Cutoff
ax.axvline(x=0, color='green', linestyle='--', linewidth=2.5, alpha=0.7)

# Annotate effect
y_left_at_cutoff = lr_left.predict([[0]])[0]
y_right_at_cutoff = lr_right.predict([[0]])[0]
estimated_effect = y_right_at_cutoff - y_left_at_cutoff

ax.annotate('', xy=(0.5, y_right_at_cutoff), xytext=(0.5, y_left_at_cutoff),
            arrowprops=dict(arrowstyle='<->', color='purple', lw=3))
ax.text(1, (y_left_at_cutoff + y_right_at_cutoff) / 2,
        f'Estimated Effect\n= {estimated_effect:.2f}',
        fontsize=12, color='purple', fontweight='bold',
        bbox=dict(boxstyle='round', facecolor='yellow', alpha=0.3))

ax.set_xlabel('X - Cutoff', fontsize=13, fontweight='bold')
ax.set_ylabel('Y', fontsize=13, fontweight='bold')
ax.set_title(f'Sharp RDD Example (True Effect = {true_effect})',
             fontsize=15, fontweight='bold')
ax.legend(loc='upper left', fontsize=11)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Regression Estimation

python
# Method 1: Full sample linear RDD
model1 = smf.ols('Y ~ D + X_centered + D:X_centered', data=df).fit()

print("\n" + "=" * 70)
print("Method 1: Full Sample Linear RDD")
print("=" * 70)
print(model1.summary().tables[1])
print(f"\nEstimated RDD effect: {model1.params['D']:.3f}")
print(f"Standard error: {model1.bse['D']:.3f}")
print(f"95% Confidence interval: [{model1.conf_int().loc['D', 0]:.3f}, {model1.conf_int().loc['D', 1]:.3f}]")

# Method 2: Bandwidth restriction (use only observations near cutoff)
bandwidth = 20
df_local = df[np.abs(df['X_centered']) <= bandwidth].copy()

model2 = smf.ols('Y ~ D + X_centered + D:X_centered', data=df_local).fit()

print("\n" + "=" * 70)
print(f"Method 2: Local Linear RDD (bandwidth = {bandwidth})")
print("=" * 70)
print(f"Observations used: {len(df_local)} / {len(df)} ({len(df_local)/len(df)*100:.1f}%)")
print(model2.summary().tables[1])
print(f"\nEstimated RDD effect: {model2.params['D']:.3f}")
print(f"Standard error: {model2.bse['D']:.3f}")

# Comparison
print("\n" + "=" * 70)
print("Effect Estimate Comparison")
print("=" * 70)
print(f"True effect:           {true_effect:.3f}")
print(f"Full sample estimate:  {model1.params['D']:.3f} (SE = {model1.bse['D']:.3f})")
print(f"Local estimate (h={bandwidth}): {model2.params['D']:.3f} (SE = {model2.bse['D']:.3f})")

Output interpretation:

  • Both methods should be close to the true effect of 10
  • Local estimate typically has larger standard errors (smaller sample size)
  • But local estimate has smaller bias (weaker functional form assumptions)

Fuzzy RDD: Imperfect Cutoffs

What is Fuzzy RDD?

In reality, the cutoff rule may be imperfect:

  • Sharp RDD: ,
  • Fuzzy RDD: , but not 0 or 1

Examples:

  1. College admission: Cutoff at 600, but special cases (sports, minority bonuses, etc.)
  2. Medicare: Auto-enrollment at age 65, but some purchase early

Fuzzy RDD Identification

Idea: Use the cutoff as an instrumental variable (IV)!

Two-stage regression:

First stage: Use cutoff to predict treatment status

Second stage: Use predicted treatment to estimate effect

Fuzzy RDD estimator:

Interpretation:

  • Numerator: Jump in outcome at cutoff (Reduced Form)
  • Denominator: Jump in treatment at cutoff (First Stage)
  • Ratio: Local average treatment effect (LATE)

Connection to IV: Fuzzy RDD is essentially IV estimation, where the cutoff () serves as the instrument!


Preview of Classic RDD Applications

Case 1: Thistlethwaite & Campbell (1960) - Birth of RDD

Research question: Does receiving National Merit Award affect future scholarship attainment?

Design:

  • Cutoff: Threshold on national exam score
  • Treatment: Receive merit award
  • Outcome: Number of subsequent scholarships

Finding: Significant positive RDD effect

Historical significance: This was the first application of RDD (1960)!

Case 2: Angrist & Lavy (1999) - Class Size and Student Achievement

Research question: Does reducing class size improve student performance?

Background:

  • Israel has a rule (Maimonides' Rule): Class size cannot exceed 40 students
  • If school has 41 students → Must split into 2 classes (~20 each)
  • If school has 40 students → 1 class (40 students)

Design:

  • Running variable: Total enrollment in school
  • Cutoffs: 40, 80, 120, ... (multiples of 40)
  • Treatment: Class size (determined by rule)
  • Outcome: Standardized test scores

Finding: Reducing class size by 1 student → Scores increase by 0.1-0.2 standard deviations

Innovation: Classic application of Fuzzy RDD (rule not perfectly enforced)

Case 3: Lee (2008) - Electoral Advantage and Re-election

Research question: Does incumbency status confer re-election advantage?

Design:

  • Cutoff: Vote share = 50%
  • Treatment: Become incumbent
  • Outcome: Vote share in next election

Key intuition:

  • Candidate with 49.9% vs candidate with 50.1%
  • These two are almost identical (political strength, funding, voter support, etc.)
  • Only difference: One wins, one loses

Finding: Huge incumbency advantage (about 40 percentage points)!


Core Assumptions of RDD

Assumption 1: Continuity Assumption ⭐

Assumption: At the cutoff, all factors except treatment status are continuous.

Mathematical expression:

In plain language: Without treatment, outcome variable doesn't jump at cutoff.

How to test? (Section 3 discusses in detail)

  1. Covariate balance tests: Check if covariates (age, gender, etc.) are balanced on both sides of cutoff
  2. Density test (McCrary Test): Check if density of running variable is smooth at cutoff
  3. Placebo tests: Test using false cutoffs

Assumption 2: No Precise Manipulation

Assumption: Individuals cannot precisely manipulate the running variable to just cross the cutoff.

Threats:

  • Exam cheating: Students know 600 is cutoff, cheat to get exactly 600
  • Election fraud: Candidates manipulate votes to get just over 50%
  • Policy lobbying: Firms lobby government to stay just below regulatory threshold

How to test?

  • McCrary density test: Check for abnormal bunching at cutoff

Assumption 3: Local Exclusion

Assumption: Running variable affects outcome only through treatment (near cutoff).

Threat:

  • If exam score itself (beyond scholarship) directly affects GPA (e.g., confidence), RDD will be biased

Rule of thumb: Choose "exogenous" running variables (birth date, lottery number)


Chapter Structure

Section 1: Chapter Introduction (Current)

  • Core ideas of RDD and counterfactual framework
  • Sharp RDD vs Fuzzy RDD
  • Comparison with RCT and DID
  • Python basic implementation

Section 2: RDD Fundamentals and Identification

  • Mathematical derivation of Sharp RDD
  • Fuzzy RDD and instrumental variables
  • Local average treatment effect (LATE)
  • Linear vs nonparametric methods

Section 3: Continuity Assumption and Validity Tests

  • Testing the continuity assumption
  • Covariate balance tests
  • Density test (McCrary Test)
  • Placebo tests

Section 4: Bandwidth Selection and Robustness Tests

  • Optimal bandwidth selection (IK, CCT)
  • Sensitivity analysis
  • Polynomial order selection
  • Donut-hole RDD

Section 5: Classic Cases and Python Implementation

  • Angrist & Lavy (1999) Class size
  • Lee (2008) Electoral advantage
  • Carpenter & Dobkin (2009) Minimum drinking age
  • Best practices using rdrobust package

Section 6: Chapter Summary

  • RDD methodology summary
  • Common pitfalls and best practices
  • Practice exercises
  • Literature recommendations

️ Python Toolkit

Core Libraries

PackageMain FunctionsInstallation
pandasData manipulationpip install pandas
numpyNumerical computationpip install numpy
statsmodelsOLS regressionpip install statsmodels
rdrobustRDD optimal bandwidth and robust inferencepip install rdrobust
rddtoolsRDD toolkit(Install from source)
matplotlibVisualizationpip install matplotlib
seabornAdvanced visualizationpip install seaborn

Basic Setup

python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
import statsmodels.formula.api as smf
from scipy import stats

# Font settings (choose based on OS)
plt.rcParams['font.sans-serif'] = ['SimHei', 'Arial Unicode MS', 'DejaVu Sans']
plt.rcParams['axes.unicode_minus'] = False

# Set style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 7)
pd.set_option('display.float_format', '{:.4f}'.format)

rdrobust Package Installation

bash
# Python version
pip install rdrobust

# Or using conda
conda install -c conda-forge rdrobust

Usage example:

python
from rdrobust import rdrobust, rdbwselect, rdplot

# Automatic bandwidth selection and robust inference
result = rdrobust(y=Y, x=X, c=cutoff)
print(result)

# Plot RDD
rdplot(y=Y, x=X, c=cutoff, nbins=20)

Essential Reading

Foundational Papers

  1. Thistlethwaite, D. L., & Campbell, D. T. (1960). "Regression-discontinuity analysis: An alternative to the ex post facto experiment." Journal of Educational Psychology, 51(6), 309.

    • Birth of RDD method
  2. Hahn, J., Todd, P., & Van der Klaauw, W. (2001). "Identification and Estimation of Treatment Effects with a Regression-Discontinuity Design." Econometrica, 69(1), 201-209.

    • Modern identification theory for RDD
  3. Lee, D. S., & Lemieux, T. (2010). "Regression Discontinuity Designs in Economics." Journal of Economic Literature, 48(2), 281-355.

    • Must-read review, the bible of RDD

Methodological Breakthroughs

  1. Imbens, G., & Kalyanaraman, K. (2012). "Optimal Bandwidth Choice for the Regression Discontinuity Estimator." Review of Economic Studies, 79(3), 933-959.

    • Optimal bandwidth selection (IK method)
  2. Calonico, S., Cattaneo, M. D., & Titiunik, R. (2014). "Robust Nonparametric Confidence Intervals for Regression-Discontinuity Designs." Econometrica, 82(6), 2295-2326.

    • Robust inference (CCT method)
  3. Gelman, A., & Imbens, G. (2019). "Why High-Order Polynomials Should Not Be Used in Regression Discontinuity Designs." Journal of Business & Economic Statistics, 37(3), 447-456.

    • Warning: Don't use high-order polynomials!

Classic Applications

  1. Angrist, J. D., & Lavy, V. (1999). "Using Maimonides' Rule to Estimate the Effect of Class Size on Scholastic Achievement." Quarterly Journal of Economics, 114(2), 533-575.

  2. Lee, D. S. (2008). "Randomized Experiments from Non-random Selection in U.S. House Elections." Journal of Econometrics, 142(2), 675-697.

  1. Angrist & Pischke (2009). Mostly Harmless Econometrics, Chapter 6
  2. Cunningham (2021). Causal Inference: The Mixtape, Chapter 6
  3. Huntington-Klein (2022). The Effect, Chapter 20

Ready to Begin?

RDD is the quasi-experimental method closest to randomized experiments. Master it, and you'll be able to:

  • Identify causal effects in the absence of randomized experiments
  • Leverage policy rules and natural cutoffs for research
  • Publish high-quality causal inference studies

Remember the core idea:

"In the neighborhood of the cutoff, RDD is as good as a randomized experiment. The discontinuity is your friend." — Joshua Angrist

Let's dive into Section 2: RDD Fundamentals and Identification!


Local randomization: a powerful tool for causal inference!

Released under the MIT License. Content © Author.