Skip to content

2.5 Identification Strategies and Validity

"No causation without manipulation."— Paul Holland, Statistician

Core Assumptions and Threats in Causal Inference


Section Objectives

  • Understand core assumptions of causal identification (SUTVA, independence)
  • Master the distinction between internal and external validity
  • Identify threats to RCT validity
  • Learn sensitivity analysis methods

Core Assumptions of Causal Identification

What is "Identification"?

Causal identification: Whether we can uniquely determine the causal effect from observed data

Intuitive understanding:

  • Not identifiable: Data is compatible with multiple different causal effects (cannot distinguish)
  • Identifiable: Data is compatible with only one causal effect (can be estimated)

1️⃣ SUTVA Assumption

Definition

SUTVA (Stable Unit Treatment Value Assumption): Stable Unit Treatment Value Assumption

Contains two sub-assumptions:

(1) No Interference

Meaning: Individual 's potential outcome depends only on their own treatment status, not affected by others' treatment status

Cases:

  • Satisfied: Drug trial (each person takes medicine independently)
  • Violated: Vaccination (herd immunity effect), education policy (peer effects)

(2) No Hidden Variations of Treatment

Meaning: Treatment has only one form, no hidden variations

Cases:

  • Violated: "Online course" may include:

    • Live lectures (high interaction)
    • Recorded lectures (low interaction)
    • Self-study materials (no interaction)

    Different forms of "online courses" may have different effects

Consequences of SUTVA Violation

python
import numpy as np
import pandas as pd

# Simulate spillover effects
np.random.seed(42)
n = 100

# Generate social network (neighbor relationships)
friends = np.random.randint(0, n, size=(n, 3))  # Each person has 3 friends

# Random assignment
data = pd.DataFrame({
    'id': range(n),
    'treatment': np.random.binomial(1, 0.5, n)
})

# Calculate proportion of friends receiving treatment
data['friends_treated'] = 0
for i in range(n):
    friend_ids = friends[i]
    data.loc[i, 'friends_treated'] = data.loc[friend_ids, 'treatment'].mean()

# Outcome variable (includes spillover effects)
# Both own treatment + friends' treatment have effects
data['Y'] = (100 +
             30 * data['treatment'] +  # Direct effect
             15 * data['friends_treated'] +  # Spillover effect
             np.random.normal(0, 10, n))

# Simple comparison mixes direct and spillover effects
simple_diff = (data[data['treatment'] == 1]['Y'].mean() -
               data[data['treatment'] == 0]['Y'].mean())

print(f"Simple comparison: {simple_diff:.2f}")
print("True direct effect: 30")
print("Spillover effect: 15 × (proportion of friends treated)")
print("\n⚠️ SUTVA violation → Simple comparison estimate is biased!")

Addressing SUTVA Violations

MethodApplicable ScenarioExample
Cluster randomizationSpillover effects limited within groupsRandomize by school
Two-stage experimentsRandomize groups and individualsFirst random villages, then random villagers
Network experimental designKnown social network structureRandomize non-adjacent nodes
Structural modelsEstimate direct and spillover effectsSpatial econometric models

2️⃣ Independence Assumption (Unconfoundedness)

Independence Under Randomization

Strong independence:

Meaning: Potential outcomes are statistically independent of treatment assignment

Core advantage of RCT: Randomization automatically satisfies this assumption

Conditional Independence

In observational studies (without randomization), we need a stronger assumption:

Meaning: Given covariates , treatment assignment is independent of potential outcomes

Common name: "Selection on Observables"

Testing Independence: Balance Checks

python
from scipy import stats
import pandas as pd
import numpy as np

def independence_test(data, covariates, treatment_col='treatment'):
    """
    Test independence between treatment assignment and covariates
    """
    results = []

    for cov in covariates:
        if data[cov].dtype in ['float64', 'int64']:
            # Continuous variable: t-test
            treated = data[data[treatment_col] == 1][cov]
            control = data[data[treatment_col] == 0][cov]
            stat, p_value = stats.ttest_ind(treated, control)
            test_type = 't-test'
        else:
            # Categorical variable: chi-square test
            contingency = pd.crosstab(data[cov], data[treatment_col])
            stat, p_value, _, _ = stats.chi2_contingency(contingency)
            test_type = 'chi2'

        results.append({
            'Covariate': cov,
            'Test type': test_type,
            'Statistic': stat,
            'p-value': p_value,
            'Balanced': '✓' if p_value > 0.05 else '✗'
        })

    return pd.DataFrame(results)

# Example
np.random.seed(42)
n = 200

data = pd.DataFrame({
    'treatment': np.random.binomial(1, 0.5, n),
    'age': np.random.normal(30, 10, n),
    'income': np.random.lognormal(10, 1, n),
    'gender': np.random.choice(['M', 'F'], n),
    'education': np.random.choice(['High School', 'Bachelor', 'Graduate'], n)
})

balance_results = independence_test(
    data,
    covariates=['age', 'income', 'gender', 'education']
)

print(balance_results)

3️⃣ Internal Validity

Definition

Internal validity: Whether causal inference is correct within the study sample

Meaning: Whether the estimator can unbiasedly and consistently estimate the causal effect within the sample

Threats

1. Selection Bias

Source: Treatment and control groups have non-comparable baselines

Situations that may occur in RCT:

  • Randomization failure (technical issues)
  • Sample size too small (random fluctuation)
  • Improper stratification

Test: Balance checks (see above)

Response:

  • Re-randomize
  • Regression control for unbalanced variables
  • Matching or IPW (inverse probability weighting)

2. Attrition

Definition: Some participants drop out, causing missing outcome data

python
# Simulate attrition
np.random.seed(42)
n = 500

data = pd.DataFrame({
    'treatment': np.random.binomial(1, 0.5, n),
    'baseline_health': np.random.normal(50, 10, n)
})

# Potential outcomes
data['Y0'] = 50 + 0.5 * data['baseline_health'] + np.random.normal(0, 5, n)
data['Y1'] = data['Y0'] + 10 + np.random.normal(0, 5, n)

# Attrition (people with poor health more likely to drop out)
attrition_prob = 1 / (1 + np.exp((data['baseline_health'] - 40) / 5))
data['attrited'] = np.random.binomial(1, attrition_prob)

# Observed outcome (only non-attrited have data)
data['Y_obs'] = np.where(data['treatment'] == 1, data['Y1'], data['Y0'])
data.loc[data['attrited'] == 1, 'Y_obs'] = np.nan

# Complete sample ATE
complete_ATE = data['Y1'].mean() - data['Y0'].mean()

# ATE with attrition
observed_data = data.dropna(subset=['Y_obs'])
attrited_ATE = (observed_data[observed_data['treatment'] == 1]['Y_obs'].mean() -
                observed_data[observed_data['treatment'] == 0]['Y_obs'].mean())

print(f"Complete sample ATE: {complete_ATE:.2f}")
print(f"ATE with attrition: {attrited_ATE:.2f}")
print(f"Bias: {attrited_ATE - complete_ATE:.2f}")

# Attrition rate comparison
attrition_by_treatment = data.groupby('treatment')['attrited'].mean()
print("\nAttrition rates:")
print(attrition_by_treatment)

Detection methods:

  1. Compare attrition rates between groups (should be similar)
  2. Check baseline characteristics of attrited vs retained

Response methods:

  • Lee Bounds: Estimate upper and lower bounds of effect
  • IPW: Inverse probability weighting (weight by retention probability)
  • Sensitivity analysis: Assume different attrition mechanisms

3. Hawthorne Effect (Observer Effect)

Definition: Participants change behavior knowing they're being observed

Cases:

  • Health research: Knowing they're monitored, pay more attention to diet and exercise
  • Education experiments: Teachers and students try harder because they're being studied

Response:

  • Double blind design: Neither participants nor researchers know assignment
  • Placebo control group: Control group receives placebo treatment
  • Unobtrusive measures: Use administrative data instead of surveys

4. Spillover Effects

See SUTVA section

5. John Henry Effect (opposite of Hawthorne)

Definition: Control group tries extra hard because they don't want to "lose"

Case: Control schools hear another school is piloting a new teaching method, competitively improve teaching quality


4️⃣ External Validity

Definition

External validity: Whether the causal effect can generalize to other populations

Threats

1. Sample Selection Bias

Problem: Experimental sample doesn't represent target population

Cases:

  • College student sample → Generalize to society?
  • Volunteer sample → Generalize to mandatory participation?
  • Single region → Generalize nationwide?
python
# Simulate sample selection bias
np.random.seed(42)

# Population (N = 10,000)
population = pd.DataFrame({
    'id': range(10000),
    'ability': np.random.normal(100, 20, 10000)
})

# True ATE (in population)
population['tau'] = 10 + 0.1 * population['ability']
true_population_ATE = population['tau'].mean()

# Experimental sample (only high-ability people participate)
sample_prob = 1 / (1 + np.exp(-(population['ability'] - 100) / 10))
sample = population[np.random.binomial(1, sample_prob, 10000) == 1].copy()

# Conduct RCT in sample
sample['treatment'] = np.random.binomial(1, 0.5, len(sample))
sample['Y0'] = 50 + 0.5 * sample['ability']
sample['Y1'] = sample['Y0'] + sample['tau']
sample['Y_obs'] = np.where(sample['treatment'] == 1, sample['Y1'], sample['Y0'])

# Estimate ATE (in sample)
sample_ATE = (sample[sample['treatment'] == 1]['Y_obs'].mean() -
              sample[sample['treatment'] == 0]['Y_obs'].mean())

print(f"Population ATE: {true_population_ATE:.2f}")
print(f"Sample ATE: {sample_ATE:.2f}")
print(f"External validity bias: {sample_ATE - true_population_ATE:.2f}")

print(f"\nPopulation average ability: {population['ability'].mean():.2f}")
print(f"Sample average ability: {sample['ability'].mean():.2f}")

2. Experimental Environment vs Real Environment

Problem: Experimental conditions too idealized

Cases:

  • Laboratory experiments → Real decision environment
  • Small-scale pilot → Large-scale rollout
  • Short-term effects → Long-term effects

3. Context-Dependent Treatment Effects

Problem: Effects depend on specific background conditions

Cases:

  • Online education works well during pandemic → After pandemic?
  • Policy effective during economic boom → During recession?

Methods to Improve External Validity

MethodDescription
Multi-site replicationRepeat in different regions, different times
Heterogeneity analysisStudy effects across subgroups, identify boundary conditions
Meta-analysisSynthesize results from multiple studies
Re-weightingWeight sample by population distribution
python
# Re-weighting to improve external validity
from sklearn.linear_model import LogisticRegression

# 1. Estimate sample selection probability (propensity score)
X = population[['ability']]
y = np.isin(population['id'], sample['id']).astype(int)

ps_model = LogisticRegression()
ps_model.fit(X, y)
population['ps'] = ps_model.predict_proba(X)[:, 1]

# 2. Calculate weights
sample_with_weights = population[population['id'].isin(sample['id'])].copy()
sample_with_weights['weight'] = 1 / sample_with_weights['ps']

# 3. Weighted ATE estimation
# (Simplified here, actually needs to combine with treatment assignment)
weighted_mean_tau = (sample_with_weights['tau'] * sample_with_weights['weight']).sum() / sample_with_weights['weight'].sum()

print(f"Unweighted ATE: {sample['tau'].mean():.2f}")
print(f"Weighted ATE: {weighted_mean_tau:.2f}")
print(f"Population ATE: {true_population_ATE:.2f}")

5️⃣ Sensitivity Analysis

Purpose

Assess robustness of results to assumption violations

Method 1: Omitted Variable Bias Analysis

Question: If unobserved confounders exist, how much would results change?

python
def sensitivity_analysis_omitted_variable(data, treatment, outcome, r2_confounder_treatment, r2_confounder_outcome):
    """
    Omitted variable sensitivity analysis

    Parameters:
    - r2_confounder_treatment: Proportion of treatment variation explained by omitted variable
    - r2_confounder_outcome: Proportion of outcome variation explained by omitted variable
    """
    import statsmodels.api as sm

    # Observed ATE
    X = sm.add_constant(data[treatment])
    model = sm.OLS(data[outcome], X).fit()
    observed_ATE = model.params[treatment]

    # Calculate bias
    # Simplified formula (Cinelli & Hazlett, 2020)
    bias = np.sqrt(r2_confounder_treatment * r2_confounder_outcome) * data[outcome].std()

    # Adjusted estimates
    adjusted_ATE_upper = observed_ATE + bias
    adjusted_ATE_lower = observed_ATE - bias

    return {
        'Observed ATE': observed_ATE,
        'Bias upper bound': bias,
        'Adjusted ATE upper bound': adjusted_ATE_upper,
        'Adjusted ATE lower bound': adjusted_ATE_lower
    }

# Example
result = sensitivity_analysis_omitted_variable(
    sample,
    treatment='treatment',
    outcome='Y_obs',
    r2_confounder_treatment=0.1,  # Omitted variable explains 10% of treatment variation
    r2_confounder_outcome=0.2      # Omitted variable explains 20% of outcome variation
)

print("Sensitivity analysis results:")
for k, v in result.items():
    print(f"  {k}: {v:.2f}")

Method 2: Lee Bounds for Attrition

python
def lee_bounds(data, treatment, outcome, attrited):
    """
    Lee (2009) bounds estimation
    Handle selection bias from attrition
    """
    # Attrition rates
    attrition_treated = data[data[treatment] == 1][attrited].mean()
    attrition_control = data[data[treatment] == 0][attrited].mean()

    # Calculate trimming proportion
    trim_prop = abs(attrition_treated - attrition_control)

    # Get non-attrited sample
    observed = data[data[attrited] == 0].copy()

    # Upper bound: Trim high values from treatment group
    if attrition_treated > attrition_control:
        treated_obs = observed[observed[treatment] == 1][outcome].sort_values(ascending=False)
        n_trim = int(len(treated_obs) * trim_prop / (1 - attrition_treated))
        treated_trimmed_upper = treated_obs.iloc[n_trim:]
        upper_bound = treated_trimmed_upper.mean() - observed[observed[treatment] == 0][outcome].mean()
    else:
        upper_bound = np.nan

    # Lower bound: Trim low values from treatment group
    if attrition_treated > attrition_control:
        treated_trimmed_lower = treated_obs.iloc[:-n_trim] if n_trim > 0 else treated_obs
        lower_bound = treated_trimmed_lower.mean() - observed[observed[treatment] == 0][outcome].mean()
    else:
        lower_bound = np.nan

    # Simple estimate (no trimming)
    simple_ATE = (observed[observed[treatment] == 1][outcome].mean() -
                  observed[observed[treatment] == 0][outcome].mean())

    return {
        'Simple estimate': simple_ATE,
        'Lee lower bound': lower_bound,
        'Lee upper bound': upper_bound
    }

# Use previous attrition data
bounds = lee_bounds(data, 'treatment', 'Y_obs', 'attrited')
print("\nLee Bounds:")
for k, v in bounds.items():
    if not np.isnan(v):
        print(f"  {k}: {v:.2f}")

Method 3: Placebo Test

python
def placebo_test(data, treatment, outcome, placebo_treatment):
    """
    Placebo test: Use a "fake treatment" that should have no effect
    """
    import statsmodels.api as sm

    # Real treatment effect
    X_real = sm.add_constant(data[treatment])
    model_real = sm.OLS(data[outcome], X_real).fit()
    real_effect = model_real.params[treatment]
    real_pvalue = model_real.pvalues[treatment]

    # Placebo treatment "effect"
    X_placebo = sm.add_constant(data[placebo_treatment])
    model_placebo = sm.OLS(data[outcome], X_placebo).fit()
    placebo_effect = model_placebo.params[placebo_treatment]
    placebo_pvalue = model_placebo.pvalues[placebo_treatment]

    print("Placebo test:")
    print(f"  Real treatment effect: {real_effect:.2f} (p={real_pvalue:.4f})")
    print(f"  Placebo effect: {placebo_effect:.2f} (p={placebo_pvalue:.4f})")

    if placebo_pvalue > 0.05:
        print("  ✓ Placebo not significant, passes test")
    else:
        print("  ✗ Placebo significant, possible issues (e.g., selection bias)")

# Example: Use lagged treatment variable as placebo
sample['placebo_treatment'] = np.roll(sample['treatment'], 1)
placebo_test(sample, 'treatment', 'Y_obs', 'placebo_treatment')

Identification Strategy Comparison

StrategyIdentification SourceCore AssumptionInternal ValidityExternal ValidityCommon Threats
RCTRandom assignmentSUTVA⭐⭐⭐⭐⭐⭐⭐⭐Attrition, Hawthorne
DIDParallel trendsNo differential trends⭐⭐⭐⭐⭐⭐⭐⭐Trend violations, reverse causality
RDDContinuityContinuity assumption⭐⭐⭐⭐⭐⭐Manipulation, functional form
IVExogenous shockExclusion restriction⭐⭐⭐⭐⭐⭐Weak instruments
PSMSelection on observablesConditional independence⭐⭐⭐⭐⭐Hidden bias

Summary

Key Points

  1. SUTVA Assumption

    • No spillover effects
    • Treatment consistency
    • Violation → Cluster randomization
  2. Independence Assumption

    • RCT automatically satisfies
    • Balance checks verify
  3. Internal Validity

    • Whether causal inference is correct within sample
    • Threats: Selection bias, attrition, Hawthorne
  4. External Validity

    • Whether generalizable to other populations
    • Improve: Multi-site, heterogeneity analysis, re-weighting
  5. Sensitivity Analysis

    • Assess impact of assumption violations
    • Methods: Omitted variables, Lee Bounds, Placebo

Practice Questions

  1. Understanding question: Explain what "high internal validity but low external validity" means, with examples.

  2. Case question: A drug RCT finds:

    • Treatment group attrition rate 30%, control group 15%
    • Complete data shows ATE = 10

    Questions:

    • (a) Is this ATE credible? Why or why not?
    • (b) How to respond?
  3. Design question: You're designing an education RCT and worried about spillover effects (classmates influencing each other).

    • (a) How would SUTVA be violated?
    • (b) How to modify experimental design?
    • (c) If you must randomize within classrooms, how to analyze data?
Click for answer hints

Question 1:

  • High internal validity: Causal inference correct within sample (e.g., rigorous RCT)
  • Low external validity: Sample doesn't represent population (e.g., experiment only at elite schools)
  • Example: Stanford online course experiment → Generalize to community colleges?

Question 2:

  • (a) Not credible! Large difference in attrition rates, likely differential attrition bias
  • (b) Use Lee Bounds to estimate effect bounds, or IPW method

Question 3:

  • (a) Treated students will influence control students (peer effects)
  • (b) Randomize by classroom (cluster randomization)
  • (c) Use Cluster-Robust standard errors

Next Steps

In the next section, we'll do Complete Python Practice, integrating all knowledge from this chapter to analyze a real RCT dataset.

Ready? 🚀


References:

  • Rubin, D. B. (1980). "Randomization analysis of experimental data: The Fisher randomization test comment". JASA.
  • Lee, D. S. (2009). "Training, wages, and sample selection: Estimating sharp bounds on treatment effects". Review of Economic Studies.
  • Cinelli, C., & Hazlett, C. (2020). "Making sense of sensitivity: Extending omitted variable bias". Journal of the Royal Statistical Society.
  • Manski, C. F. (1990). "Nonparametric bounds on treatment effects". American Economic Review.

Released under the MIT License. Content © Author.