11.3 Continuity Assumption and Validity Tests

"The validity of RDD rests on the continuity assumption."— David Lee, Econometrician

Assumptions cannot be directly tested, but can be indirectly verified

Section Overview

RDD's causal identification relies on the continuity assumption, but this assumption cannot be directly tested (involves counterfactuals). This section introduces:

Meaning and threats to the continuity assumption
Covariate balance tests
Density test (McCrary Density Test)
Placebo tests
Complete Python implementation

Continuity Assumption: The Foundation of RDD

Formal Definition

Continuity Assumption: At cutoff , potential outcome functions are continuous.

Mathematical expression:

In plain language:

Without treatment, the outcome variable should be smooth at the cutoff (no jump)
In other words: Small changes in running variable don't suddenly change potential outcomes

Why is the Continuity Assumption Credible?

Local randomization perspective:

Near the cutoff, individuals should be "similar":

Student with 599 points vs student with 600 points
Ability, family background, study habits should be almost identical
Only difference: One just crossed the cutoff

Formalization: If individuals near cutoff are balanced on all covariates, potential outcomes should also be balanced.

Threats to the Continuity Assumption

Threat 1: Precise Manipulation

Problem: Individuals can precisely manipulate running variable to just cross cutoff.

Examples:

Exam cheating: Students know 600 is cutoff, cheat to get exactly 600
Election fraud: Candidates manipulate votes to get just over 50%
Firm lobbying: Companies lobby government to keep size just below regulatory threshold

Consequence:

Individuals right of cutoff systematically differ from those left (selection bias)
Potential outcomes may jump at cutoff (violates continuity)

How to detect:

McCrary density test: Check for abnormal bunching in running variable at cutoff

Threat 2: Other Policies Change Simultaneously at Cutoff

Problem: Besides the treatment we care about, other policies also change at cutoff.

Example:

Exam score of 600 not only gets scholarship but also admission to honors class
If we observe GPA increase, can't distinguish between scholarship effect and honors class effect

Consequence:

RDD estimates the combined effect of all factors that change at cutoff
Cannot isolate individual treatment effect

How to avoid:

Carefully study institutional background, ensure only one policy changes at cutoff
Or explicitly acknowledge estimating "bundled policy" effect

Threat 3: Running Variable Directly Affects Outcome

Problem: Running variable affects outcome directly, beyond through treatment .

Example:

Exam score itself (beyond scholarship) also affects student confidence
Age itself (beyond Medicare) also affects health behaviors

Consequence:

If direct effect of is discontinuous at cutoff, violates continuity assumption
But as long as direct effect of is smooth, continuity assumption still holds

Test:

Covariate balance tests (if covariates balanced, running variable should be "quasi-random")

Validity Test 1: Covariate Balance Tests

Core Idea

Logic:

If treatment near cutoff is "quasi-random"
Then all baseline covariates should be balanced (continuous) at cutoff

Test: For each covariate , run "pseudo-RDD":

Null hypothesis: (no jump in covariate at cutoff)

If reject :

Covariate is imbalanced at cutoff
Possible selection bias or manipulation
Continuity assumption is questioned

Python Implementation

python

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.formula.api as smf
from rdrobust import rdrobust

# Suppose we have the following covariates
# X: running variable, D: treatment, Y: outcome
# Z1: age, Z2: gender, Z3: family income

def covariate_balance_test(df, covariate, cutoff, bandwidth=None):
    """
    Covariate balance test

    Parameters:
    - df: dataframe
    - covariate: covariate name
    - cutoff: cutoff
    - bandwidth: bandwidth (if None, use full sample)
    """
    df_test = df.copy()
    df_test['X_c'] = df_test['X'] - cutoff

    if bandwidth is not None:
        df_test = df_test[np.abs(df_test['X_c']) <= bandwidth]

    # Run RDD (covariate as outcome variable)
    model = smf.ols(f'{covariate} ~ D + X_c + D:X_c', data=df_test).fit()

    # Extract results
    jump = model.params['D']
    se = model.bse['D']
    pvalue = model.pvalues['D']

    return {
        'covariate': covariate,
        'jump': jump,
        'se': se,
        'pvalue': pvalue,
        'significant': pvalue < 0.05
    }

# Example data
np.random.seed(42)
n = 2000
X = np.random.normal(0, 10, n)
D = (X >= 0).astype(int)

# Generate covariates (should be balanced at cutoff)
age = 25 + 0.1 * X + np.random.normal(0, 3, n)
gender = np.random.binomial(1, 0.5, n)  # Independent of X
income = 50000 + 500 * X + np.random.normal(0, 10000, n)

# Generate outcome variable
Y = 50 + 0.5 * X + 10 * D + np.random.normal(0, 5, n)

df = pd.DataFrame({
    'X': X, 'D': D, 'Y': Y,
    'age': age, 'gender': gender, 'income': income
})

# Test all covariates for balance
covariates = ['age', 'gender', 'income']
balance_results = []

for cov in covariates:
    result = covariate_balance_test(df, cov, cutoff=0, bandwidth=20)
    balance_results.append(result)

balance_df = pd.DataFrame(balance_results)

print("=" * 70)
print("Covariate Balance Tests")
print("=" * 70)
print(balance_df.to_string(index=False))
print("\nInterpretation: p-value > 0.05 indicates covariate is balanced at cutoff (good)")

Visualizing Covariate Balance

python

from rdrobust import rdplot

fig, axes = plt.subplots(1, 3, figsize=(18, 5))

for i, cov in enumerate(covariates):
    ax = axes[i]

    # Using rdplot
    rdplot_result = rdplot(y=df[cov], x=df['X'], c=0,
                          title=f'Covariate Balance: {cov}',
                          x_label='Running Variable',
                          y_label=cov)

    # Note: rdplot creates new figures, here we manually plot
    # Binning
    bins = pd.cut(df['X'], bins=20)
    df_binned = df.groupby([bins, 'D']).agg({cov: 'mean', 'X': 'mean'}).reset_index()

    df_left = df_binned[df_binned['D'] == 0]
    df_right = df_binned[df_binned['D'] == 1]

    ax.scatter(df_left['X'], df_left[cov], color='blue', s=50, alpha=0.6, label='Control')
    ax.scatter(df_right['X'], df_right[cov], color='red', s=50, alpha=0.6, label='Treated')

    # Fitted lines
    df_local = df[np.abs(df['X']) <= 20]
    df_local_left = df_local[df_local['D'] == 0]
    df_local_right = df_local[df_local['D'] == 1]

    from sklearn.linear_model import LinearRegression
    if len(df_local_left) > 0:
        lr_left = LinearRegression().fit(df_local_left[['X']], df_local_left[cov])
        X_range_left = np.linspace(df_local_left['X'].min(), 0, 100)
        ax.plot(X_range_left, lr_left.predict(X_range_left.reshape(-1, 1)),
                color='blue', linewidth=2)

    if len(df_local_right) > 0:
        lr_right = LinearRegression().fit(df_local_right[['X']], df_local_right[cov])
        X_range_right = np.linspace(0, df_local_right['X'].max(), 100)
        ax.plot(X_range_right, lr_right.predict(X_range_right.reshape(-1, 1)),
                color='red', linewidth=2)

    ax.axvline(x=0, color='green', linestyle='--', linewidth=2)
    ax.set_title(f'{cov} (p={balance_df.loc[i, "pvalue"]:.3f})', fontsize=12, fontweight='bold')
    ax.set_xlabel('Running Variable', fontsize=11)
    ax.set_ylabel(cov, fontsize=11)
    ax.legend()
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Interpretation:

If covariate has no obvious jump at cutoff → Passes balance test (good)
If covariate jumps at cutoff → Possible manipulation or selection bias (bad)

Validity Test 2: Density Test (McCrary Test)

Core Idea

Logic:

If individuals can precisely manipulate running variable, they will bunch at cutoff
Example: Students cheating to get exactly 600 points
This causes density function of running variable to be discontinuous at cutoff

McCrary (2008) test:

where is density of running variable .

If reject :

Density jumps at cutoff
Possible manipulation
Continuity assumption seriously threatened

Implementation of McCrary Test

Steps:

Divide running variable into small bins
Calculate frequency in each bin (construct histogram)
Fit smooth curves separately on left and right of cutoff (kernel density or local linear)
Test for density jump at left and right sides of cutoff

Test statistic:

where:

: Density estimate right of cutoff
: Density estimate left of cutoff

Python Implementation (using rddensity package)

python

# Install rddensity
# pip install rddensity

from rddensity import rddensity

# McCrary density test
density_test = rddensity(X=df['X'], c=0)

print("=" * 70)
print("McCrary Density Test")
print("=" * 70)
print(density_test)
print(f"\np-value: {density_test.pval[0]:.4f}")
print("Interpretation: p-value > 0.05 indicates density is continuous (no manipulation evidence)")

Manual Implementation of Density Test (Simplified)

python

def mccrary_density_test(X, c, bandwidth=None, n_bins=30):
    """
    Simplified McCrary density test

    Parameters:
    - X: running variable
    - c: cutoff
    - bandwidth: bandwidth
    - n_bins: number of histogram bins
    """
    # Create histogram
    counts, bin_edges = np.histogram(X, bins=n_bins)
    bin_centers = (bin_edges[:-1] + bin_edges[1:]) / 2

    # Separate left and right
    left_mask = bin_centers < c
    right_mask = bin_centers >= c

    bin_centers_left = bin_centers[left_mask]
    counts_left = counts[left_mask]

    bin_centers_right = bin_centers[right_mask]
    counts_right = counts[right_mask]

    # Fit local linear (simplified: use OLS)
    from sklearn.linear_model import LinearRegression

    if len(bin_centers_left) > 1:
        lr_left = LinearRegression().fit(
            bin_centers_left.reshape(-1, 1), counts_left
        )
        density_left_at_c = lr_left.predict([[c]])[0]
    else:
        density_left_at_c = 0

    if len(bin_centers_right) > 1:
        lr_right = LinearRegression().fit(
            bin_centers_right.reshape(-1, 1), counts_right
        )
        density_right_at_c = lr_right.predict([[c]])[0]
    else:
        density_right_at_c = 0

    # Calculate jump
    jump = density_right_at_c - density_left_at_c

    # Visualization
    fig, ax = plt.subplots(figsize=(12, 6))

    # Histogram
    ax.bar(bin_centers_left, counts_left, width=np.diff(bin_edges)[0],
           alpha=0.5, color='blue', edgecolor='black', label='Left of cutoff')
    ax.bar(bin_centers_right, counts_right, width=np.diff(bin_edges)[0],
           alpha=0.5, color='red', edgecolor='black', label='Right of cutoff')

    # Fitted lines
    if len(bin_centers_left) > 1:
        X_left_range = np.linspace(bin_centers_left.min(), c, 100)
        ax.plot(X_left_range, lr_left.predict(X_left_range.reshape(-1, 1)),
                color='blue', linewidth=3, label='Fitted (left)')

    if len(bin_centers_right) > 1:
        X_right_range = np.linspace(c, bin_centers_right.max(), 100)
        ax.plot(X_right_range, lr_right.predict(X_right_range.reshape(-1, 1)),
                color='red', linewidth=3, label='Fitted (right)')

    # Cutoff
    ax.axvline(x=c, color='green', linestyle='--', linewidth=2.5)

    ax.set_xlabel('Running Variable (X)', fontsize=13, fontweight='bold')
    ax.set_ylabel('Frequency', fontsize=13, fontweight='bold')
    ax.set_title(f'McCrary Density Test (Jump = {jump:.2f})',
                 fontsize=15, fontweight='bold')
    ax.legend(fontsize=11)
    ax.grid(True, alpha=0.3)

    plt.tight_layout()
    plt.show()

    return jump

# Run test
jump = mccrary_density_test(df['X'], c=0)
print(f"\nDensity jump: {jump:.2f}")

Interpretation:

No jump (): Density continuous at cutoff, no manipulation evidence
Jump (): Density discontinuous at cutoff, possible manipulation

Validity Test 3: Placebo Tests

Core Idea

Logic:

If RDD identification is credible, effect should only appear at true cutoff
At false cutoffs (placebo cutoffs), there should be no jump

Implementation:

Choose a false cutoff (away from true cutoff )
Run RDD at false cutoff
Test for significant effect

Expected result:

: No effect at false cutoff ()
If reject → RDD design may have problems

Python Implementation

python

def placebo_test(df, true_cutoff, placebo_cutoff, bandwidth=20):
    """
    Placebo test

    Parameters:
    - df: dataframe
    - true_cutoff: true cutoff
    - placebo_cutoff: false cutoff
    - bandwidth: bandwidth
    """
    # Use only data left or right of true cutoff (avoid including true cutoff)
    if placebo_cutoff < true_cutoff:
        # Use left data
        df_placebo = df[df['X'] < true_cutoff].copy()
    else:
        # Use right data
        df_placebo = df[df['X'] >= true_cutoff].copy()

    # Create false treatment variable
    df_placebo['D_placebo'] = (df_placebo['X'] >= placebo_cutoff).astype(int)
    df_placebo['X_c_placebo'] = df_placebo['X'] - placebo_cutoff

    # Restrict to within bandwidth
    df_placebo_local = df_placebo[np.abs(df_placebo['X_c_placebo']) <= bandwidth]

    # Run RDD
    model = smf.ols('Y ~ D_placebo + X_c_placebo + D_placebo:X_c_placebo',
                    data=df_placebo_local).fit()

    return {
        'placebo_cutoff': placebo_cutoff,
        'effect': model.params['D_placebo'],
        'se': model.bse['D_placebo'],
        'pvalue': model.pvalues['D_placebo'],
        'significant': model.pvalues['D_placebo'] < 0.05
    }

# Try multiple false cutoffs
placebo_cutoffs = [-15, -10, -5, 5, 10, 15]
placebo_results = []

for pc in placebo_cutoffs:
    if pc != 0:  # Exclude true cutoff
        result = placebo_test(df, true_cutoff=0, placebo_cutoff=pc, bandwidth=10)
        placebo_results.append(result)

placebo_df = pd.DataFrame(placebo_results)

print("=" * 70)
print("Placebo Tests (False Cutoffs)")
print("=" * 70)
print(placebo_df.to_string(index=False))
print("\nExpected: All false cutoffs have p-value > 0.05 (no significant effect)")

Visualizing Placebo Tests

python

fig, ax = plt.subplots(figsize=(12, 6))

# Plot estimated effects and confidence intervals
ax.errorbar(placebo_df['placebo_cutoff'], placebo_df['effect'],
            yerr=1.96 * placebo_df['se'],
            fmt='o', capsize=5, capthick=2, linewidth=2, markersize=8,
            color='gray', label='Placebo cutoffs')

# Effect at true cutoff
true_model = smf.ols('Y ~ D + X_c + D:X_c',
                     data=df[np.abs(df['X']) <= 20]).fit()
true_effect = true_model.params['D']
true_se = true_model.bse['D']

ax.errorbar(0, true_effect, yerr=1.96 * true_se,
            fmt='*', capsize=5, capthick=3, linewidth=3, markersize=15,
            color='red', label='True cutoff')

# Reference line
ax.axhline(y=0, color='black', linestyle='-', linewidth=1)

ax.set_xlabel('Cutoff', fontsize=13, fontweight='bold')
ax.set_ylabel('Estimated RDD Effect', fontsize=13, fontweight='bold')
ax.set_title('Placebo Test: Effect at Different Cutoffs',
             fontsize=15, fontweight='bold')
ax.legend(fontsize=11)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Interpretation:

False cutoffs have no effect → Passes placebo test
False cutoffs have effect → RDD design may have problems

Validity Test 4: RDD with Covariates as Outcomes

Core Idea

Logic:

If treatment near cutoff is "quasi-random"
Then running RDD on baseline covariates should show no effect

Implementation:

Replace outcome variable with covariate
Run standard RDD
Test for jump

This is essentially the same as covariate balance test, but more intuitive.

Python Implementation

python

from rdrobust import rdrobust

# Run RDD on each covariate
print("=" * 70)
print("Covariate RDD Tests")
print("=" * 70)

for cov in covariates:
    result = rdrobust(y=df[cov], x=df['X'], c=0)

    print(f"\n{cov}:")
    print(f"  RDD effect: {result.coef[0]:.4f}")
    print(f"  p-value: {result.pval[0]:.4f}")
    print(f"  Conclusion: {'Imbalanced' if result.pval[0] < 0.05 else 'Balanced'}")

Complete Validity Test Report

Integrate all tests into one report:

python

def rdd_validity_report(df, X_col, D_col, Y_col, covariates, cutoff=0, bandwidth=20):
    """
    Generate complete RDD validity test report

    Parameters:
    - df: dataframe
    - X_col: running variable column name
    - D_col: treatment variable column name
    - Y_col: outcome variable column name
    - covariates: list of covariates
    - cutoff: cutoff
    - bandwidth: bandwidth
    """
    df = df.copy()
    df['X_c'] = df[X_col] - cutoff

    print("=" * 80)
    print(" " * 20 + "RDD Validity Test Report")
    print("=" * 80)

    # 1. Main effect estimate
    print("\n[1] Main Effect Estimate")
    print("-" * 80)
    main_result = rdrobust(y=df[Y_col], x=df[X_col], c=cutoff)
    print(f"RDD effect: {main_result.coef[0]:.4f}")
    print(f"Robust p-value: {main_result.pval[0]:.4f}")
    print(f"95% CI: [{main_result.ci[0][0]:.4f}, {main_result.ci[0][1]:.4f}]")

    # 2. Covariate balance tests
    print("\n[2] Covariate Balance Tests")
    print("-" * 80)
    balance_results = []
    for cov in covariates:
        result = rdrobust(y=df[cov], x=df[X_col], c=cutoff)
        balance_results.append({
            'Covariate': cov,
            'Jump': result.coef[0],
            'p-value': result.pval[0],
            'Balanced': '✓' if result.pval[0] > 0.05 else '✗'
        })
    balance_df = pd.DataFrame(balance_results)
    print(balance_df.to_string(index=False))

    # 3. Density test
    print("\n[3] McCrary Density Test")
    print("-" * 80)
    try:
        from rddensity import rddensity
        density_result = rddensity(X=df[X_col], c=cutoff)
        print(f"T-statistic: {density_result.test['T'][0]:.4f}")
        print(f"p-value: {density_result.pval[0]:.4f}")
        print(f"Conclusion: {'✓ Density continuous (no manipulation evidence)' if density_result.pval[0] > 0.05 else '✗ Density discontinuous (possible manipulation)'}")
    except ImportError:
        print("rddensity package not installed, skipping density test")

    # 4. Placebo tests
    print("\n[4] Placebo Tests (False Cutoffs)")
    print("-" * 80)
    placebo_cutoffs = [cutoff - 15, cutoff - 10, cutoff + 10, cutoff + 15]
    placebo_results = []

    for pc in placebo_cutoffs:
        # Use subsample not containing true cutoff
        if pc < cutoff:
            df_sub = df[df[X_col] < cutoff]
        else:
            df_sub = df[df[X_col] >= cutoff]

        df_sub['D_placebo'] = (df_sub[X_col] >= pc).astype(int)
        df_sub['X_c_placebo'] = df_sub[X_col] - pc
        df_sub_local = df_sub[np.abs(df_sub['X_c_placebo']) <= bandwidth]

        if len(df_sub_local) > 50:
            model = smf.ols(f'{Y_col} ~ D_placebo + X_c_placebo + D_placebo:X_c_placebo',
                           data=df_sub_local).fit()
            placebo_results.append({
                'Placebo Cutoff': pc,
                'Effect': model.params['D_placebo'],
                'p-value': model.pvalues['D_placebo'],
                'Significant': '✗' if model.pvalues['D_placebo'] < 0.05 else '✓'
            })

    placebo_df = pd.DataFrame(placebo_results)
    print(placebo_df.to_string(index=False))
    print("\nExpected: All false cutoffs non-significant (✓)")

    print("\n" + "=" * 80)
    print(" " * 25 + "Report Complete")
    print("=" * 80)

# Generate report
rdd_validity_report(
    df=df,
    X_col='X',
    D_col='D',
    Y_col='Y',
    covariates=['age', 'gender', 'income'],
    cutoff=0,
    bandwidth=20
)

Key Takeaways

Continuity Assumption

Core: Potential outcomes continuous (smooth) at cutoff
Untestable: Involves counterfactuals, cannot directly observe
Indirect verification: Through covariate balance, density tests, etc.

Covariate Balance Tests

Logic: If treatment is quasi-random, covariates should balance
Implementation: Run RDD on each covariate, test for jump
Interpretation: p > 0.05 indicates balance (good)

McCrary Density Test

Purpose: Detect precise manipulation
Method: Test if density of running variable is continuous at cutoff
Interpretation: p > 0.05 indicates no manipulation evidence (good)

Placebo Tests

Logic: Effect should only appear at true cutoff
Implementation: Run RDD at false cutoffs
Interpretation: False cutoffs showing no effect indicates credible design (good)

Section Summary

In this section, we learned:

Meaning and threat factors of continuity assumption
Theory and implementation of covariate balance tests
McCrary density test (detecting manipulation)
Placebo tests (using false cutoffs)
Complete Python implementation and report generation

Key lesson:

"RDD's credibility depends on the continuity assumption. While it cannot be directly tested, we can verify its plausibility through multiple indirect methods."

Next step: In Section 4, we will discuss in depth bandwidth selection, sensitivity analysis, and other robustness tests.

Rigorously test assumptions to ensure credible causal inference!

11.3 Continuity Assumption and Validity Tests ​

Section Overview ​

Continuity Assumption: The Foundation of RDD ​

Formal Definition ​

Why is the Continuity Assumption Credible? ​

Threats to the Continuity Assumption ​

Threat 1: Precise Manipulation ​

Threat 2: Other Policies Change Simultaneously at Cutoff ​

Threat 3: Running Variable Directly Affects Outcome ​

Validity Test 1: Covariate Balance Tests ​

Core Idea ​

Python Implementation ​

Visualizing Covariate Balance ​

Validity Test 2: Density Test (McCrary Test) ​

Core Idea ​

Implementation of McCrary Test ​

Python Implementation (using rddensity package) ​

Manual Implementation of Density Test (Simplified) ​

Validity Test 3: Placebo Tests ​

Core Idea ​

Python Implementation ​

Visualizing Placebo Tests ​

Validity Test 4: RDD with Covariates as Outcomes ​

Core Idea ​

Python Implementation ​

Complete Validity Test Report ​

Key Takeaways ​

Continuity Assumption ​

Covariate Balance Tests ​

McCrary Density Test ​

Placebo Tests ​

Section Summary ​

11.3 Continuity Assumption and Validity Tests

Section Overview

Continuity Assumption: The Foundation of RDD

Formal Definition

Why is the Continuity Assumption Credible?

Threats to the Continuity Assumption

Threat 1: Precise Manipulation

Threat 2: Other Policies Change Simultaneously at Cutoff

Threat 3: Running Variable Directly Affects Outcome

Validity Test 1: Covariate Balance Tests

Core Idea

Python Implementation

Visualizing Covariate Balance

Validity Test 2: Density Test (McCrary Test)

Core Idea

Implementation of McCrary Test

Python Implementation (using rddensity package)

Manual Implementation of Density Test (Simplified)

Validity Test 3: Placebo Tests

Core Idea

Python Implementation

Visualizing Placebo Tests

Validity Test 4: RDD with Covariates as Outcomes

Core Idea

Python Implementation

Complete Validity Test Report

Key Takeaways

Continuity Assumption

Covariate Balance Tests

McCrary Density Test

Placebo Tests

Section Summary