Skip to content

11.2 RDD Fundamentals and Identification

"Regression discontinuity is one of the most credible quasi-experimental strategies."— Guido Imbens & Thomas Lemieux, RDD Review Authors

From potential outcomes framework to local randomization: the theoretical foundations of RDD


Section Overview

In this section, we will explore in depth:

  • Rigorous mathematical derivation of Sharp RDD
  • Relationship between Fuzzy RDD and instrumental variables (IV)
  • Meaning of local average treatment effect (LATE)
  • Parametric vs nonparametric estimation methods
  • Bandwidth selection tradeoffs

Identification Theory of Sharp RDD

Reviewing the Potential Outcomes Framework

In the Rubin Causal Model, each individual has two potential outcomes:

  • : If treated
  • : If untreated

Individual treatment effect:

Fundamental problem: We can never observe both and simultaneously!

Observed outcome:

where is the treatment indicator.

RDD Identification Logic

Core idea: Near the cutoff , treatment assignment is "quasi-random".

Sharp RDD assignment rule:

Key assumption: Continuity Assumption

For , assume:

Intuition:

  • At cutoff , the potential outcome function is continuous
  • In other words, small changes in running variable don't cause jumps in potential outcomes

Deriving the Identification Strategy

Observed conditional expectation:

Right of cutoff (, everyone treated):

Left of cutoff (, everyone untreated):

RDD estimator:

Why is this a causal effect?

According to the continuity assumption:

Key points:

  1. RDD estimates the treatment effect at the cutoff, not overall ATE
  2. This is a local effect (Local Average Treatment Effect, LATE)
  3. Extrapolation to other values requires additional assumptions (e.g., effect homogeneity)

Fuzzy RDD: Imperfect Cutoff Rules

Motivation for Fuzzy RDD

In reality, treatment assignment rules may be imperfect:

Example 1: College admission

  • Rule: Exam score ≥ 600 → Admitted
  • Reality: Special admissions (sports, arts, principal recommendations, etc.)
  • Result: ,

Example 2: Medicare

  • Rule: Age ≥ 65 → Automatically enrolled in Medicare
  • Reality: Some purchase early, some choose not to participate
  • Result: Treatment jumps at cutoff but not from 0 to 1

Definition of Fuzzy RDD

Sharp RDD:

Fuzzy RDD:

But not necessarily perfect 0 and 1.

Key condition:

That is: Treatment probability jumps at the cutoff.

Fuzzy RDD Identification: Instrumental Variables Approach

Core idea: Use the cutoff indicator as an instrumental variable (IV)!

Define instrument:

Three key IV conditions:

  1. Relevance: is correlated with treatment

  2. Exclusion: affects only through (after controlling for )

  3. Monotonicity: Crossing cutoff doesn't make anyone switch from treated to untreated

Fuzzy RDD estimator:

Reduced Form:

First Stage:

Fuzzy RDD effect (Wald estimator):

Causal Interpretation of Fuzzy RDD

Local Average Treatment Effect (LATE):

Fuzzy RDD estimates the treatment effect for compliers.

Four types of individuals (based on potential treatment states):

  1. Always-takers: (treated regardless of cutoff)
  2. Never-takers: (untreated regardless of cutoff)
  3. Compliers: (treated only if cross cutoff)
  4. Defiers: (untreated if cross cutoff)

Monotonicity assumption rules out the existence of Defiers.

LATE theorem (Imbens & Angrist 1994):

Interpretation:

  • Fuzzy RDD estimates the average treatment effect for compliers (those who change treatment status by crossing cutoff)
  • For Always-takers and Never-takers, we cannot identify their treatment effects
  • If Compliers differ from population, LATE ≠ ATE

Parametric vs Nonparametric Estimation Methods

Method 1: Global Polynomial Regression

Model (-order polynomial):

Advantages:

  • Simple and intuitive
  • Uses all data, high efficiency

Disadvantages:

  • Sensitive to functional form assumptions (choosing is critical)
  • Data far from cutoff can bias estimates
  • Gelman & Imbens (2019) warn: Do not use polynomials!

Python implementation:

python
import statsmodels.formula.api as smf

# Second-order polynomial
model = smf.ols('Y ~ D + X_c + I(X_c**2) + D:X_c + D:I(X_c**2)', data=df).fit()
print(model.summary())

Approach: Use only data within of cutoff, fit linear model.

Model:

Bandwidth :

  • Too small: High variance (less data)
  • Too large: High bias (functional form assumptions may be wrong)

Advantages:

  • Weakest functional form assumptions (only locally linear)
  • Theoretically optimal (Fan & Gijbels 1996)
  • Modern software (like rdrobust) automatically selects optimal bandwidth

Python implementation:

python
# Manual bandwidth selection
h = 10
df_local = df[np.abs(df['X_c']) <= h]
model_local = smf.ols('Y ~ D + X_c + D:X_c', data=df_local).fit()

# Using rdrobust (automatic bandwidth)
from rdrobust import rdrobust
result = rdrobust(y=df['Y'], x=df['X'], c=cutoff)
print(result)

Method 3: Kernel-Weighted Local Linear Regression

Further improvement: Give higher weight to observations closer to cutoff.

Triangular kernel function:

Weight:

Weighted regression:

Python implementation:

python
from statsmodels.regression.linear_model import WLS

# Calculate weights
df['weight'] = np.maximum(0, 1 - np.abs(df['X_c']) / h)

# Weighted least squares
model_wls = smf.wls('Y ~ D + X_c + D:X_c', data=df, weights=df['weight']).fit()
print(model_wls.summary())

️ Bandwidth Selection Tradeoffs

Bias-Variance Tradeoff

Small bandwidth :

  • Advantage: Low bias (more accurate functional approximation)
  • Disadvantage: High variance (small sample, unstable estimate)

Large bandwidth :

  • Advantage: Low variance (large sample, stable estimate)
  • Disadvantage: High bias (may violate local linearity assumption)

Mean squared error (MSE):

Optimal bandwidth : Minimizes MSE.

Optimal Bandwidth Selection Methods

1. Imbens-Kalyanaraman (IK) Method (2012)

Approach: Based on asymptotic expansion of MSE, derive optimal bandwidth.

Formula (simplified):

where:

  • : Residual variance
  • : Density of running variable at cutoff
  • : Second derivative of left potential outcome function

Python implementation:

python
from rdrobust import rdbwselect

# Automatic IK bandwidth selection
bw = rdbwselect(y=df['Y'], x=df['X'], c=cutoff, bwselect='mserd')
print(f"IK Bandwidth: {bw.bws[0]}")

2. Calonico-Cattaneo-Titiunik (CCT) Method (2014)

Improvements:

  • Considers finite-sample bias correction
  • Provides robust confidence intervals

Two bandwidths:

  1. Main bandwidth : For point estimation
  2. Bias bandwidth : For estimating and correcting bias

Python implementation (rdrobust uses CCT by default):

python
from rdrobust import rdrobust

# CCT method (default)
result_cct = rdrobust(y=df['Y'], x=df['X'], c=cutoff)
print(result_cct)

Cross-Validation

Leave-one-out cross-validation:

  1. For each observation , remove it
  2. Fit model with remaining data (using bandwidth )
  3. Predict
  4. Calculate prediction error:
  5. Repeat for all observations, choose minimizing

Note: Use only data on one side of cutoff for cross-validation (avoid using jump itself).


Complete Python Example: Sharp vs Fuzzy RDD

Example 1: Sharp RDD

python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf
from rdrobust import rdrobust, rdplot

# Setup
np.random.seed(42)
n = 2000
c = 0

# Generate running variable
X = np.random.normal(0, 10, n)

# Sharp RDD: treatment completely determined by cutoff
D = (X >= c).astype(int)

# Generate outcome variable
# DGP: Y = 50 + 0.5*X + 0.01*X^2 + 10*D + noise
true_effect = 10
Y = 50 + 0.5 * X + 0.01 * X**2 + true_effect * D + np.random.normal(0, 5, n)

df = pd.DataFrame({'X': X, 'D': D, 'Y': Y, 'X_c': X - c})

# Estimate (using rdrobust)
result_sharp = rdrobust(y=df['Y'], x=df['X'], c=c)
print("=" * 70)
print("Sharp RDD Results")
print("=" * 70)
print(result_sharp)

# Visualization
rdplot(y=df['Y'], x=df['X'], c=c,
       title='Sharp RDD',
       x_label='Running Variable',
       y_label='Outcome')
plt.show()

Example 2: Fuzzy RDD

python
# Fuzzy RDD: imperfect treatment
# Crossing cutoff, treatment probability jumps from 0.2 to 0.8
np.random.seed(42)

# Potential treatment state
prob_treat = 0.2 + 0.6 * (X >= c)  # 20% left, 80% right
D_fuzzy = np.random.binomial(1, prob_treat)

# Generate outcome variable (true effect still 10)
Y_fuzzy = 50 + 0.5 * X + 0.01 * X**2 + true_effect * D_fuzzy + np.random.normal(0, 5, n)

df_fuzzy = pd.DataFrame({'X': X, 'D': D_fuzzy, 'Y': Y_fuzzy, 'X_c': X - c})

# Fuzzy RDD estimation (automatically detects and uses IV)
result_fuzzy = rdrobust(y=df_fuzzy['Y'], x=df_fuzzy['X'], c=c, fuzzy=df_fuzzy['D'])
print("\n" + "=" * 70)
print("Fuzzy RDD Results")
print("=" * 70)
print(result_fuzzy)

# Check first stage
print("\nFirst Stage Check:")
first_stage = rdrobust(y=df_fuzzy['D'], x=df_fuzzy['X'], c=c)
print(f"Jump in treatment probability: {first_stage.coef[0]:.3f}")
print(f"F-statistic: {first_stage.z[0]**2:.2f}")

Example 3: Sensitivity Analysis with Different Bandwidths

python
# Try different bandwidths
bandwidths = [5, 10, 15, 20, 25]
results = []

for h in bandwidths:
    df_local = df[np.abs(df['X_c']) <= h]
    model = smf.ols('Y ~ D + X_c + D:X_c', data=df_local).fit()

    results.append({
        'bandwidth': h,
        'effect': model.params['D'],
        'se': model.bse['D'],
        'n': len(df_local)
    })

results_df = pd.DataFrame(results)

print("\n" + "=" * 70)
print("Bandwidth Sensitivity Analysis")
print("=" * 70)
print(results_df.to_string(index=False))

# Visualization
fig, ax = plt.subplots(figsize=(12, 6))
ax.errorbar(results_df['bandwidth'], results_df['effect'],
            yerr=1.96 * results_df['se'],
            fmt='o-', capsize=5, capthick=2, linewidth=2, markersize=8)
ax.axhline(y=true_effect, color='red', linestyle='--', linewidth=2,
           label=f'True Effect = {true_effect}')
ax.set_xlabel('Bandwidth', fontsize=13, fontweight='bold')
ax.set_ylabel('Estimated RDD Effect', fontsize=13, fontweight='bold')
ax.set_title('Sensitivity to Bandwidth Choice', fontsize=15, fontweight='bold')
ax.legend(fontsize=11)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Statistical Inference for RDD

Computing Standard Errors

Heteroskedasticity-robust standard errors (HC1/HC2):

python
model = smf.ols('Y ~ D + X_c + D:X_c', data=df_local).fit(cov_type='HC2')
print(model.summary())

Clustered standard errors (if clustered structure exists):

python
# Assuming data clustered at school level
model_cluster = smf.ols('Y ~ D + X_c + D:X_c', data=df_local).fit(
    cov_type='cluster', cov_kwds={'groups': df_local['school_id']}
)

Constructing Confidence Intervals

Conventional confidence interval (based on asymptotic normality):

Robust confidence interval (CCT method, considers finite-sample bias):

python
from rdrobust import rdrobust

# CCT robust confidence interval
result = rdrobust(y=df['Y'], x=df['X'], c=c)
print(f"Point estimate: {result.coef[0]:.3f}")
print(f"Robust 95% CI: [{result.ci[0][0]:.3f}, {result.ci[0][1]:.3f}]")

Bootstrap confidence interval:

python
from scipy.stats import bootstrap

def rdd_estimator(data, indices):
    """RDD estimator (for bootstrap)"""
    df_boot = data.iloc[indices]
    df_boot_local = df_boot[np.abs(df_boot['X_c']) <= h]
    model = smf.ols('Y ~ D + X_c + D:X_c', data=df_boot_local).fit()
    return model.params['D']

# Bootstrap (1000 resamples)
n_boot = 1000
boot_estimates = []
for _ in range(n_boot):
    indices = np.random.choice(len(df), len(df), replace=True)
    boot_estimates.append(rdd_estimator(df, indices))

boot_estimates = np.array(boot_estimates)
ci_lower = np.percentile(boot_estimates, 2.5)
ci_upper = np.percentile(boot_estimates, 97.5)

print(f"Bootstrap 95% CI: [{ci_lower:.3f}, {ci_upper:.3f}]")

Key Takeaways

Sharp RDD

  1. Identification condition: Continuity assumption (potential outcomes continuous at cutoff)
  2. Estimator: Jump in observed outcome at cutoff
  3. Causal interpretation: Local average treatment effect at cutoff (LATE)
  4. Best practice: Use local linear regression + automatic bandwidth selection (CCT)

Fuzzy RDD

  1. Essence: IV estimation, cutoff as instrument
  2. Identification conditions: Continuity + IV assumptions (relevance, exclusion, monotonicity)
  3. Estimator: Wald estimator (outcome jump / treatment jump)
  4. Causal interpretation: LATE for compliers
  5. Test: First stage must be strong (F > 10)

Bandwidth Selection

  1. Tradeoff: Bias (small bandwidth) vs variance (large bandwidth)
  2. Optimal method: IK or CCT (automatic data-driven)
  3. Robustness: Report results under multiple bandwidths

Section Summary

In this section, we learned:

  • Rigorous mathematical derivation of Sharp RDD (from potential outcomes framework)
  • Deep connection between Fuzzy RDD and instrumental variables
  • Meaning and limitations of local average treatment effect (LATE)
  • Tradeoffs between parametric (polynomial) vs nonparametric (local linear) methods
  • Theory and practice of bandwidth selection (IK, CCT)
  • Complete Python implementation and robust inference

Next step: In Section 3, we will learn how to test RDD's core assumptions, including continuity assumption, density tests (McCrary Test), and covariate balance.


Solid theory ensures credible empirics!

Released under the MIT License. Content © Author.