11.2 RDD Fundamentals and Identification

"Regression discontinuity is one of the most credible quasi-experimental strategies."— Guido Imbens & Thomas Lemieux, RDD Review Authors

From potential outcomes framework to local randomization: the theoretical foundations of RDD

Section Overview

In this section, we will explore in depth:

Rigorous mathematical derivation of Sharp RDD
Relationship between Fuzzy RDD and instrumental variables (IV)
Meaning of local average treatment effect (LATE)
Parametric vs nonparametric estimation methods
Bandwidth selection tradeoffs

Identification Theory of Sharp RDD

Reviewing the Potential Outcomes Framework

In the Rubin Causal Model, each individual has two potential outcomes:

: If treated
: If untreated

Individual treatment effect:

Fundamental problem: We can never observe both and simultaneously!

Observed outcome:

where is the treatment indicator.

RDD Identification Logic

Core idea: Near the cutoff , treatment assignment is "quasi-random".

Sharp RDD assignment rule:

Key assumption: Continuity Assumption

For , assume:

Intuition:

At cutoff , the potential outcome function is continuous
In other words, small changes in running variable don't cause jumps in potential outcomes

Deriving the Identification Strategy

Observed conditional expectation:

Right of cutoff (, everyone treated):

Left of cutoff (, everyone untreated):

RDD estimator:

Why is this a causal effect?

According to the continuity assumption:

Key points:

RDD estimates the treatment effect at the cutoff, not overall ATE
This is a local effect (Local Average Treatment Effect, LATE)
Extrapolation to other values requires additional assumptions (e.g., effect homogeneity)

Fuzzy RDD: Imperfect Cutoff Rules

Motivation for Fuzzy RDD

In reality, treatment assignment rules may be imperfect:

Example 1: College admission

Rule: Exam score ≥ 600 → Admitted
Reality: Special admissions (sports, arts, principal recommendations, etc.)
Result: ,

Example 2: Medicare

Rule: Age ≥ 65 → Automatically enrolled in Medicare
Reality: Some purchase early, some choose not to participate
Result: Treatment jumps at cutoff but not from 0 to 1

Definition of Fuzzy RDD

Sharp RDD:

Fuzzy RDD:

But not necessarily perfect 0 and 1.

Key condition:

That is: Treatment probability jumps at the cutoff.

Fuzzy RDD Identification: Instrumental Variables Approach

Core idea: Use the cutoff indicator as an instrumental variable (IV)!

Define instrument:

Three key IV conditions:

Relevance: is correlated with treatment
Exclusion: affects only through (after controlling for )
Monotonicity: Crossing cutoff doesn't make anyone switch from treated to untreated

Fuzzy RDD estimator:

Reduced Form:

First Stage:

Fuzzy RDD effect (Wald estimator):

Causal Interpretation of Fuzzy RDD

Local Average Treatment Effect (LATE):

Fuzzy RDD estimates the treatment effect for compliers.

Four types of individuals (based on potential treatment states):

Always-takers: (treated regardless of cutoff)
Never-takers: (untreated regardless of cutoff)
Compliers: (treated only if cross cutoff)
Defiers: (untreated if cross cutoff)

Monotonicity assumption rules out the existence of Defiers.

LATE theorem (Imbens & Angrist 1994):

Interpretation:

Fuzzy RDD estimates the average treatment effect for compliers (those who change treatment status by crossing cutoff)
For Always-takers and Never-takers, we cannot identify their treatment effects
If Compliers differ from population, LATE ≠ ATE

Parametric vs Nonparametric Estimation Methods

Method 1: Global Polynomial Regression

Model (-order polynomial):

Advantages:

Simple and intuitive
Uses all data, high efficiency

Disadvantages:

Sensitive to functional form assumptions (choosing is critical)
Data far from cutoff can bias estimates
Gelman & Imbens (2019) warn: Do not use polynomials!

Python implementation:

python

import statsmodels.formula.api as smf

# Second-order polynomial
model = smf.ols('Y ~ D + X_c + I(X_c**2) + D:X_c + D:I(X_c**2)', data=df).fit()
print(model.summary())

Method 2: Local Linear Regression (Recommended)

Approach: Use only data within of cutoff, fit linear model.

Model:

Bandwidth :

Too small: High variance (less data)
Too large: High bias (functional form assumptions may be wrong)

Advantages:

Weakest functional form assumptions (only locally linear)
Theoretically optimal (Fan & Gijbels 1996)
Modern software (like rdrobust) automatically selects optimal bandwidth

Python implementation:

python

# Manual bandwidth selection
h = 10
df_local = df[np.abs(df['X_c']) <= h]
model_local = smf.ols('Y ~ D + X_c + D:X_c', data=df_local).fit()

# Using rdrobust (automatic bandwidth)
from rdrobust import rdrobust
result = rdrobust(y=df['Y'], x=df['X'], c=cutoff)
print(result)

Method 3: Kernel-Weighted Local Linear Regression

Further improvement: Give higher weight to observations closer to cutoff.

Triangular kernel function:

Weight:

Weighted regression:

Python implementation:

python

from statsmodels.regression.linear_model import WLS

# Calculate weights
df['weight'] = np.maximum(0, 1 - np.abs(df['X_c']) / h)

# Weighted least squares
model_wls = smf.wls('Y ~ D + X_c + D:X_c', data=df, weights=df['weight']).fit()
print(model_wls.summary())

️ Bandwidth Selection Tradeoffs

Bias-Variance Tradeoff

Small bandwidth :

Advantage: Low bias (more accurate functional approximation)
Disadvantage: High variance (small sample, unstable estimate)

Large bandwidth :

Advantage: Low variance (large sample, stable estimate)
Disadvantage: High bias (may violate local linearity assumption)

Mean squared error (MSE):

Optimal bandwidth : Minimizes MSE.

Optimal Bandwidth Selection Methods

1. Imbens-Kalyanaraman (IK) Method (2012)

Approach: Based on asymptotic expansion of MSE, derive optimal bandwidth.

Formula (simplified):

where:

: Residual variance
: Density of running variable at cutoff
: Second derivative of left potential outcome function

Python implementation:

python

from rdrobust import rdbwselect

# Automatic IK bandwidth selection
bw = rdbwselect(y=df['Y'], x=df['X'], c=cutoff, bwselect='mserd')
print(f"IK Bandwidth: {bw.bws[0]}")

2. Calonico-Cattaneo-Titiunik (CCT) Method (2014)

Improvements:

Considers finite-sample bias correction
Provides robust confidence intervals

Two bandwidths:

Main bandwidth : For point estimation
Bias bandwidth : For estimating and correcting bias

Python implementation (rdrobust uses CCT by default):

python

from rdrobust import rdrobust

# CCT method (default)
result_cct = rdrobust(y=df['Y'], x=df['X'], c=cutoff)
print(result_cct)

Cross-Validation

Leave-one-out cross-validation:

For each observation , remove it
Fit model with remaining data (using bandwidth )
Predict
Calculate prediction error:
Repeat for all observations, choose minimizing

Note: Use only data on one side of cutoff for cross-validation (avoid using jump itself).

Complete Python Example: Sharp vs Fuzzy RDD

Example 1: Sharp RDD

python

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf
from rdrobust import rdrobust, rdplot

# Setup
np.random.seed(42)
n = 2000
c = 0

# Generate running variable
X = np.random.normal(0, 10, n)

# Sharp RDD: treatment completely determined by cutoff
D = (X >= c).astype(int)

# Generate outcome variable
# DGP: Y = 50 + 0.5*X + 0.01*X^2 + 10*D + noise
true_effect = 10
Y = 50 + 0.5 * X + 0.01 * X**2 + true_effect * D + np.random.normal(0, 5, n)

df = pd.DataFrame({'X': X, 'D': D, 'Y': Y, 'X_c': X - c})

# Estimate (using rdrobust)
result_sharp = rdrobust(y=df['Y'], x=df['X'], c=c)
print("=" * 70)
print("Sharp RDD Results")
print("=" * 70)
print(result_sharp)

# Visualization
rdplot(y=df['Y'], x=df['X'], c=c,
       title='Sharp RDD',
       x_label='Running Variable',
       y_label='Outcome')
plt.show()

Example 2: Fuzzy RDD

python

# Fuzzy RDD: imperfect treatment
# Crossing cutoff, treatment probability jumps from 0.2 to 0.8
np.random.seed(42)

# Potential treatment state
prob_treat = 0.2 + 0.6 * (X >= c)  # 20% left, 80% right
D_fuzzy = np.random.binomial(1, prob_treat)

# Generate outcome variable (true effect still 10)
Y_fuzzy = 50 + 0.5 * X + 0.01 * X**2 + true_effect * D_fuzzy + np.random.normal(0, 5, n)

df_fuzzy = pd.DataFrame({'X': X, 'D': D_fuzzy, 'Y': Y_fuzzy, 'X_c': X - c})

# Fuzzy RDD estimation (automatically detects and uses IV)
result_fuzzy = rdrobust(y=df_fuzzy['Y'], x=df_fuzzy['X'], c=c, fuzzy=df_fuzzy['D'])
print("\n" + "=" * 70)
print("Fuzzy RDD Results")
print("=" * 70)
print(result_fuzzy)

# Check first stage
print("\nFirst Stage Check:")
first_stage = rdrobust(y=df_fuzzy['D'], x=df_fuzzy['X'], c=c)
print(f"Jump in treatment probability: {first_stage.coef[0]:.3f}")
print(f"F-statistic: {first_stage.z[0]**2:.2f}")

Example 3: Sensitivity Analysis with Different Bandwidths

python

# Try different bandwidths
bandwidths = [5, 10, 15, 20, 25]
results = []

for h in bandwidths:
    df_local = df[np.abs(df['X_c']) <= h]
    model = smf.ols('Y ~ D + X_c + D:X_c', data=df_local).fit()

    results.append({
        'bandwidth': h,
        'effect': model.params['D'],
        'se': model.bse['D'],
        'n': len(df_local)
    })

results_df = pd.DataFrame(results)

print("\n" + "=" * 70)
print("Bandwidth Sensitivity Analysis")
print("=" * 70)
print(results_df.to_string(index=False))

# Visualization
fig, ax = plt.subplots(figsize=(12, 6))
ax.errorbar(results_df['bandwidth'], results_df['effect'],
            yerr=1.96 * results_df['se'],
            fmt='o-', capsize=5, capthick=2, linewidth=2, markersize=8)
ax.axhline(y=true_effect, color='red', linestyle='--', linewidth=2,
           label=f'True Effect = {true_effect}')
ax.set_xlabel('Bandwidth', fontsize=13, fontweight='bold')
ax.set_ylabel('Estimated RDD Effect', fontsize=13, fontweight='bold')
ax.set_title('Sensitivity to Bandwidth Choice', fontsize=15, fontweight='bold')
ax.legend(fontsize=11)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Statistical Inference for RDD

Computing Standard Errors

Heteroskedasticity-robust standard errors (HC1/HC2):

python

model = smf.ols('Y ~ D + X_c + D:X_c', data=df_local).fit(cov_type='HC2')
print(model.summary())

Clustered standard errors (if clustered structure exists):

python

# Assuming data clustered at school level
model_cluster = smf.ols('Y ~ D + X_c + D:X_c', data=df_local).fit(
    cov_type='cluster', cov_kwds={'groups': df_local['school_id']}
)

Constructing Confidence Intervals

Conventional confidence interval (based on asymptotic normality):

Robust confidence interval (CCT method, considers finite-sample bias):

python

from rdrobust import rdrobust

# CCT robust confidence interval
result = rdrobust(y=df['Y'], x=df['X'], c=c)
print(f"Point estimate: {result.coef[0]:.3f}")
print(f"Robust 95% CI: [{result.ci[0][0]:.3f}, {result.ci[0][1]:.3f}]")

Bootstrap confidence interval:

python

from scipy.stats import bootstrap

def rdd_estimator(data, indices):
    """RDD estimator (for bootstrap)"""
    df_boot = data.iloc[indices]
    df_boot_local = df_boot[np.abs(df_boot['X_c']) <= h]
    model = smf.ols('Y ~ D + X_c + D:X_c', data=df_boot_local).fit()
    return model.params['D']

# Bootstrap (1000 resamples)
n_boot = 1000
boot_estimates = []
for _ in range(n_boot):
    indices = np.random.choice(len(df), len(df), replace=True)
    boot_estimates.append(rdd_estimator(df, indices))

boot_estimates = np.array(boot_estimates)
ci_lower = np.percentile(boot_estimates, 2.5)
ci_upper = np.percentile(boot_estimates, 97.5)

print(f"Bootstrap 95% CI: [{ci_lower:.3f}, {ci_upper:.3f}]")

Key Takeaways

Sharp RDD

Identification condition: Continuity assumption (potential outcomes continuous at cutoff)
Estimator: Jump in observed outcome at cutoff
Causal interpretation: Local average treatment effect at cutoff (LATE)
Best practice: Use local linear regression + automatic bandwidth selection (CCT)

Fuzzy RDD

Essence: IV estimation, cutoff as instrument
Identification conditions: Continuity + IV assumptions (relevance, exclusion, monotonicity)
Estimator: Wald estimator (outcome jump / treatment jump)
Causal interpretation: LATE for compliers
Test: First stage must be strong (F > 10)

Bandwidth Selection

Tradeoff: Bias (small bandwidth) vs variance (large bandwidth)
Optimal method: IK or CCT (automatic data-driven)
Robustness: Report results under multiple bandwidths

Section Summary

In this section, we learned:

Rigorous mathematical derivation of Sharp RDD (from potential outcomes framework)
Deep connection between Fuzzy RDD and instrumental variables
Meaning and limitations of local average treatment effect (LATE)
Tradeoffs between parametric (polynomial) vs nonparametric (local linear) methods
Theory and practice of bandwidth selection (IK, CCT)
Complete Python implementation and robust inference

Next step: In Section 3, we will learn how to test RDD's core assumptions, including continuity assumption, density tests (McCrary Test), and covariate balance.

Solid theory ensures credible empirics!

11.2 RDD Fundamentals and Identification ​

Section Overview ​

Identification Theory of Sharp RDD ​

Reviewing the Potential Outcomes Framework ​

RDD Identification Logic ​

Deriving the Identification Strategy ​

Fuzzy RDD: Imperfect Cutoff Rules ​

Motivation for Fuzzy RDD ​

Definition of Fuzzy RDD ​

Fuzzy RDD Identification: Instrumental Variables Approach ​

Causal Interpretation of Fuzzy RDD ​

Parametric vs Nonparametric Estimation Methods ​

Method 1: Global Polynomial Regression ​

Method 2: Local Linear Regression (Recommended) ​

Method 3: Kernel-Weighted Local Linear Regression ​

️ Bandwidth Selection Tradeoffs ​

Bias-Variance Tradeoff ​

Optimal Bandwidth Selection Methods ​

1. Imbens-Kalyanaraman (IK) Method (2012) ​

2. Calonico-Cattaneo-Titiunik (CCT) Method (2014) ​

Cross-Validation ​

Complete Python Example: Sharp vs Fuzzy RDD ​

Example 1: Sharp RDD ​

Example 2: Fuzzy RDD ​

Example 3: Sensitivity Analysis with Different Bandwidths ​

Statistical Inference for RDD ​

Computing Standard Errors ​

Constructing Confidence Intervals ​

Key Takeaways ​

Sharp RDD ​

Fuzzy RDD ​

Bandwidth Selection ​

Section Summary ​

11.2 RDD Fundamentals and Identification

Section Overview

Identification Theory of Sharp RDD

Reviewing the Potential Outcomes Framework

RDD Identification Logic

Deriving the Identification Strategy

Fuzzy RDD: Imperfect Cutoff Rules

Motivation for Fuzzy RDD

Definition of Fuzzy RDD

Fuzzy RDD Identification: Instrumental Variables Approach

Causal Interpretation of Fuzzy RDD

Parametric vs Nonparametric Estimation Methods

Method 1: Global Polynomial Regression

Method 2: Local Linear Regression (Recommended)

Method 3: Kernel-Weighted Local Linear Regression

️ Bandwidth Selection Tradeoffs

Bias-Variance Tradeoff

Optimal Bandwidth Selection Methods

1. Imbens-Kalyanaraman (IK) Method (2012)

2. Calonico-Cattaneo-Titiunik (CCT) Method (2014)

Cross-Validation

Complete Python Example: Sharp vs Fuzzy RDD

Example 1: Sharp RDD

Example 2: Fuzzy RDD

Example 3: Sensitivity Analysis with Different Bandwidths

Statistical Inference for RDD

Computing Standard Errors

Constructing Confidence Intervals

Key Takeaways

Sharp RDD

Fuzzy RDD

Bandwidth Selection

Section Summary