2.5 Identification Strategies and Validity

"No causation without manipulation."— Paul Holland, Statistician

Core Assumptions and Threats in Causal Inference

Section Objectives

Understand core assumptions of causal identification (SUTVA, independence)
Master the distinction between internal and external validity
Identify threats to RCT validity
Learn sensitivity analysis methods

Core Assumptions of Causal Identification

What is "Identification"?

Causal identification: Whether we can uniquely determine the causal effect from observed data

Intuitive understanding:

❌ Not identifiable: Data is compatible with multiple different causal effects (cannot distinguish)
✅ Identifiable: Data is compatible with only one causal effect (can be estimated)

1️⃣ SUTVA Assumption

Definition

SUTVA (Stable Unit Treatment Value Assumption): Stable Unit Treatment Value Assumption

Contains two sub-assumptions:

(1) No Interference

Meaning: Individual 's potential outcome depends only on their own treatment status, not affected by others' treatment status

Cases:

✅ Satisfied: Drug trial (each person takes medicine independently)
❌ Violated: Vaccination (herd immunity effect), education policy (peer effects)

(2) No Hidden Variations of Treatment

Meaning: Treatment has only one form, no hidden variations

Cases:

❌ Violated: "Online course" may include:
- Live lectures (high interaction)
- Recorded lectures (low interaction)
- Self-study materials (no interaction)
Different forms of "online courses" may have different effects

Consequences of SUTVA Violation

python

import numpy as np
import pandas as pd

# Simulate spillover effects
np.random.seed(42)
n = 100

# Generate social network (neighbor relationships)
friends = np.random.randint(0, n, size=(n, 3))  # Each person has 3 friends

# Random assignment
data = pd.DataFrame({
    'id': range(n),
    'treatment': np.random.binomial(1, 0.5, n)
})

# Calculate proportion of friends receiving treatment
data['friends_treated'] = 0
for i in range(n):
    friend_ids = friends[i]
    data.loc[i, 'friends_treated'] = data.loc[friend_ids, 'treatment'].mean()

# Outcome variable (includes spillover effects)
# Both own treatment + friends' treatment have effects
data['Y'] = (100 +
             30 * data['treatment'] +  # Direct effect
             15 * data['friends_treated'] +  # Spillover effect
             np.random.normal(0, 10, n))

# Simple comparison mixes direct and spillover effects
simple_diff = (data[data['treatment'] == 1]['Y'].mean() -
               data[data['treatment'] == 0]['Y'].mean())

print(f"Simple comparison: {simple_diff:.2f}")
print("True direct effect: 30")
print("Spillover effect: 15 × (proportion of friends treated)")
print("\n⚠️ SUTVA violation → Simple comparison estimate is biased!")

Addressing SUTVA Violations

Method	Applicable Scenario	Example
Cluster randomization	Spillover effects limited within groups	Randomize by school
Two-stage experiments	Randomize groups and individuals	First random villages, then random villagers
Network experimental design	Known social network structure	Randomize non-adjacent nodes
Structural models	Estimate direct and spillover effects	Spatial econometric models

2️⃣ Independence Assumption (Unconfoundedness)

Independence Under Randomization

Strong independence:

Meaning: Potential outcomes are statistically independent of treatment assignment

Core advantage of RCT: Randomization automatically satisfies this assumption

Conditional Independence

In observational studies (without randomization), we need a stronger assumption:

Meaning: Given covariates , treatment assignment is independent of potential outcomes

Common name: "Selection on Observables"

Testing Independence: Balance Checks

python

from scipy import stats
import pandas as pd
import numpy as np

def independence_test(data, covariates, treatment_col='treatment'):
    """
    Test independence between treatment assignment and covariates
    """
    results = []

    for cov in covariates:
        if data[cov].dtype in ['float64', 'int64']:
            # Continuous variable: t-test
            treated = data[data[treatment_col] == 1][cov]
            control = data[data[treatment_col] == 0][cov]
            stat, p_value = stats.ttest_ind(treated, control)
            test_type = 't-test'
        else:
            # Categorical variable: chi-square test
            contingency = pd.crosstab(data[cov], data[treatment_col])
            stat, p_value, _, _ = stats.chi2_contingency(contingency)
            test_type = 'chi2'

        results.append({
            'Covariate': cov,
            'Test type': test_type,
            'Statistic': stat,
            'p-value': p_value,
            'Balanced': '✓' if p_value > 0.05 else '✗'
        })

    return pd.DataFrame(results)

# Example
np.random.seed(42)
n = 200

data = pd.DataFrame({
    'treatment': np.random.binomial(1, 0.5, n),
    'age': np.random.normal(30, 10, n),
    'income': np.random.lognormal(10, 1, n),
    'gender': np.random.choice(['M', 'F'], n),
    'education': np.random.choice(['High School', 'Bachelor', 'Graduate'], n)
})

balance_results = independence_test(
    data,
    covariates=['age', 'income', 'gender', 'education']
)

print(balance_results)

3️⃣ Internal Validity

Definition

Internal validity: Whether causal inference is correct within the study sample

Meaning: Whether the estimator can unbiasedly and consistently estimate the causal effect within the sample

Threats

1. Selection Bias

Source: Treatment and control groups have non-comparable baselines

Situations that may occur in RCT:

Randomization failure (technical issues)
Sample size too small (random fluctuation)
Improper stratification

Test: Balance checks (see above)

Response:

Re-randomize
Regression control for unbalanced variables
Matching or IPW (inverse probability weighting)

2. Attrition

Definition: Some participants drop out, causing missing outcome data

python

# Simulate attrition
np.random.seed(42)
n = 500

data = pd.DataFrame({
    'treatment': np.random.binomial(1, 0.5, n),
    'baseline_health': np.random.normal(50, 10, n)
})

# Potential outcomes
data['Y0'] = 50 + 0.5 * data['baseline_health'] + np.random.normal(0, 5, n)
data['Y1'] = data['Y0'] + 10 + np.random.normal(0, 5, n)

# Attrition (people with poor health more likely to drop out)
attrition_prob = 1 / (1 + np.exp((data['baseline_health'] - 40) / 5))
data['attrited'] = np.random.binomial(1, attrition_prob)

# Observed outcome (only non-attrited have data)
data['Y_obs'] = np.where(data['treatment'] == 1, data['Y1'], data['Y0'])
data.loc[data['attrited'] == 1, 'Y_obs'] = np.nan

# Complete sample ATE
complete_ATE = data['Y1'].mean() - data['Y0'].mean()

# ATE with attrition
observed_data = data.dropna(subset=['Y_obs'])
attrited_ATE = (observed_data[observed_data['treatment'] == 1]['Y_obs'].mean() -
                observed_data[observed_data['treatment'] == 0]['Y_obs'].mean())

print(f"Complete sample ATE: {complete_ATE:.2f}")
print(f"ATE with attrition: {attrited_ATE:.2f}")
print(f"Bias: {attrited_ATE - complete_ATE:.2f}")

# Attrition rate comparison
attrition_by_treatment = data.groupby('treatment')['attrited'].mean()
print("\nAttrition rates:")
print(attrition_by_treatment)

Detection methods:

Compare attrition rates between groups (should be similar)
Check baseline characteristics of attrited vs retained

Response methods:

Lee Bounds: Estimate upper and lower bounds of effect
IPW: Inverse probability weighting (weight by retention probability)
Sensitivity analysis: Assume different attrition mechanisms

3. Hawthorne Effect (Observer Effect)

Definition: Participants change behavior knowing they're being observed

Cases:

Health research: Knowing they're monitored, pay more attention to diet and exercise
Education experiments: Teachers and students try harder because they're being studied

Response:

Double blind design: Neither participants nor researchers know assignment
Placebo control group: Control group receives placebo treatment
Unobtrusive measures: Use administrative data instead of surveys

4. Spillover Effects

See SUTVA section

5. John Henry Effect (opposite of Hawthorne)

Definition: Control group tries extra hard because they don't want to "lose"

Case: Control schools hear another school is piloting a new teaching method, competitively improve teaching quality

4️⃣ External Validity

Definition

External validity: Whether the causal effect can generalize to other populations

Threats

1. Sample Selection Bias

Problem: Experimental sample doesn't represent target population

Cases:

College student sample → Generalize to society?
Volunteer sample → Generalize to mandatory participation?
Single region → Generalize nationwide?

python

# Simulate sample selection bias
np.random.seed(42)

# Population (N = 10,000)
population = pd.DataFrame({
    'id': range(10000),
    'ability': np.random.normal(100, 20, 10000)
})

# True ATE (in population)
population['tau'] = 10 + 0.1 * population['ability']
true_population_ATE = population['tau'].mean()

# Experimental sample (only high-ability people participate)
sample_prob = 1 / (1 + np.exp(-(population['ability'] - 100) / 10))
sample = population[np.random.binomial(1, sample_prob, 10000) == 1].copy()

# Conduct RCT in sample
sample['treatment'] = np.random.binomial(1, 0.5, len(sample))
sample['Y0'] = 50 + 0.5 * sample['ability']
sample['Y1'] = sample['Y0'] + sample['tau']
sample['Y_obs'] = np.where(sample['treatment'] == 1, sample['Y1'], sample['Y0'])

# Estimate ATE (in sample)
sample_ATE = (sample[sample['treatment'] == 1]['Y_obs'].mean() -
              sample[sample['treatment'] == 0]['Y_obs'].mean())

print(f"Population ATE: {true_population_ATE:.2f}")
print(f"Sample ATE: {sample_ATE:.2f}")
print(f"External validity bias: {sample_ATE - true_population_ATE:.2f}")

print(f"\nPopulation average ability: {population['ability'].mean():.2f}")
print(f"Sample average ability: {sample['ability'].mean():.2f}")

2. Experimental Environment vs Real Environment

Problem: Experimental conditions too idealized

Cases:

Laboratory experiments → Real decision environment
Small-scale pilot → Large-scale rollout
Short-term effects → Long-term effects

3. Context-Dependent Treatment Effects

Problem: Effects depend on specific background conditions

Cases:

Online education works well during pandemic → After pandemic?
Policy effective during economic boom → During recession?

Methods to Improve External Validity

Method	Description
Multi-site replication	Repeat in different regions, different times
Heterogeneity analysis	Study effects across subgroups, identify boundary conditions
Meta-analysis	Synthesize results from multiple studies
Re-weighting	Weight sample by population distribution

python

# Re-weighting to improve external validity
from sklearn.linear_model import LogisticRegression

# 1. Estimate sample selection probability (propensity score)
X = population[['ability']]
y = np.isin(population['id'], sample['id']).astype(int)

ps_model = LogisticRegression()
ps_model.fit(X, y)
population['ps'] = ps_model.predict_proba(X)[:, 1]

# 2. Calculate weights
sample_with_weights = population[population['id'].isin(sample['id'])].copy()
sample_with_weights['weight'] = 1 / sample_with_weights['ps']

# 3. Weighted ATE estimation
# (Simplified here, actually needs to combine with treatment assignment)
weighted_mean_tau = (sample_with_weights['tau'] * sample_with_weights['weight']).sum() / sample_with_weights['weight'].sum()

print(f"Unweighted ATE: {sample['tau'].mean():.2f}")
print(f"Weighted ATE: {weighted_mean_tau:.2f}")
print(f"Population ATE: {true_population_ATE:.2f}")

5️⃣ Sensitivity Analysis

Purpose

Assess robustness of results to assumption violations

Method 1: Omitted Variable Bias Analysis

Question: If unobserved confounders exist, how much would results change?

python

def sensitivity_analysis_omitted_variable(data, treatment, outcome, r2_confounder_treatment, r2_confounder_outcome):
    """
    Omitted variable sensitivity analysis

    Parameters:
    - r2_confounder_treatment: Proportion of treatment variation explained by omitted variable
    - r2_confounder_outcome: Proportion of outcome variation explained by omitted variable
    """
    import statsmodels.api as sm

    # Observed ATE
    X = sm.add_constant(data[treatment])
    model = sm.OLS(data[outcome], X).fit()
    observed_ATE = model.params[treatment]

    # Calculate bias
    # Simplified formula (Cinelli & Hazlett, 2020)
    bias = np.sqrt(r2_confounder_treatment * r2_confounder_outcome) * data[outcome].std()

    # Adjusted estimates
    adjusted_ATE_upper = observed_ATE + bias
    adjusted_ATE_lower = observed_ATE - bias

    return {
        'Observed ATE': observed_ATE,
        'Bias upper bound': bias,
        'Adjusted ATE upper bound': adjusted_ATE_upper,
        'Adjusted ATE lower bound': adjusted_ATE_lower
    }

# Example
result = sensitivity_analysis_omitted_variable(
    sample,
    treatment='treatment',
    outcome='Y_obs',
    r2_confounder_treatment=0.1,  # Omitted variable explains 10% of treatment variation
    r2_confounder_outcome=0.2      # Omitted variable explains 20% of outcome variation
)

print("Sensitivity analysis results:")
for k, v in result.items():
    print(f"  {k}: {v:.2f}")

Method 2: Lee Bounds for Attrition

python

def lee_bounds(data, treatment, outcome, attrited):
    """
    Lee (2009) bounds estimation
    Handle selection bias from attrition
    """
    # Attrition rates
    attrition_treated = data[data[treatment] == 1][attrited].mean()
    attrition_control = data[data[treatment] == 0][attrited].mean()

    # Calculate trimming proportion
    trim_prop = abs(attrition_treated - attrition_control)

    # Get non-attrited sample
    observed = data[data[attrited] == 0].copy()

    # Upper bound: Trim high values from treatment group
    if attrition_treated > attrition_control:
        treated_obs = observed[observed[treatment] == 1][outcome].sort_values(ascending=False)
        n_trim = int(len(treated_obs) * trim_prop / (1 - attrition_treated))
        treated_trimmed_upper = treated_obs.iloc[n_trim:]
        upper_bound = treated_trimmed_upper.mean() - observed[observed[treatment] == 0][outcome].mean()
    else:
        upper_bound = np.nan

    # Lower bound: Trim low values from treatment group
    if attrition_treated > attrition_control:
        treated_trimmed_lower = treated_obs.iloc[:-n_trim] if n_trim > 0 else treated_obs
        lower_bound = treated_trimmed_lower.mean() - observed[observed[treatment] == 0][outcome].mean()
    else:
        lower_bound = np.nan

    # Simple estimate (no trimming)
    simple_ATE = (observed[observed[treatment] == 1][outcome].mean() -
                  observed[observed[treatment] == 0][outcome].mean())

    return {
        'Simple estimate': simple_ATE,
        'Lee lower bound': lower_bound,
        'Lee upper bound': upper_bound
    }

# Use previous attrition data
bounds = lee_bounds(data, 'treatment', 'Y_obs', 'attrited')
print("\nLee Bounds:")
for k, v in bounds.items():
    if not np.isnan(v):
        print(f"  {k}: {v:.2f}")

Method 3: Placebo Test

python

def placebo_test(data, treatment, outcome, placebo_treatment):
    """
    Placebo test: Use a "fake treatment" that should have no effect
    """
    import statsmodels.api as sm

    # Real treatment effect
    X_real = sm.add_constant(data[treatment])
    model_real = sm.OLS(data[outcome], X_real).fit()
    real_effect = model_real.params[treatment]
    real_pvalue = model_real.pvalues[treatment]

    # Placebo treatment "effect"
    X_placebo = sm.add_constant(data[placebo_treatment])
    model_placebo = sm.OLS(data[outcome], X_placebo).fit()
    placebo_effect = model_placebo.params[placebo_treatment]
    placebo_pvalue = model_placebo.pvalues[placebo_treatment]

    print("Placebo test:")
    print(f"  Real treatment effect: {real_effect:.2f} (p={real_pvalue:.4f})")
    print(f"  Placebo effect: {placebo_effect:.2f} (p={placebo_pvalue:.4f})")

    if placebo_pvalue > 0.05:
        print("  ✓ Placebo not significant, passes test")
    else:
        print("  ✗ Placebo significant, possible issues (e.g., selection bias)")

# Example: Use lagged treatment variable as placebo
sample['placebo_treatment'] = np.roll(sample['treatment'], 1)
placebo_test(sample, 'treatment', 'Y_obs', 'placebo_treatment')

Identification Strategy Comparison

Strategy	Identification Source	Core Assumption	Internal Validity	External Validity	Common Threats
RCT	Random assignment	SUTVA	⭐⭐⭐⭐⭐	⭐⭐⭐	Attrition, Hawthorne
DID	Parallel trends	No differential trends	⭐⭐⭐⭐	⭐⭐⭐⭐	Trend violations, reverse causality
RDD	Continuity	Continuity assumption	⭐⭐⭐⭐	⭐⭐	Manipulation, functional form
IV	Exogenous shock	Exclusion restriction	⭐⭐⭐	⭐⭐⭐	Weak instruments
PSM	Selection on observables	Conditional independence	⭐⭐	⭐⭐⭐	Hidden bias

Summary

Key Points

SUTVA Assumption
- No spillover effects
- Treatment consistency
- Violation → Cluster randomization
Independence Assumption
- RCT automatically satisfies
- Balance checks verify
Internal Validity
- Whether causal inference is correct within sample
- Threats: Selection bias, attrition, Hawthorne
External Validity
- Whether generalizable to other populations
- Improve: Multi-site, heterogeneity analysis, re-weighting
Sensitivity Analysis
- Assess impact of assumption violations
- Methods: Omitted variables, Lee Bounds, Placebo

Practice Questions

Understanding question: Explain what "high internal validity but low external validity" means, with examples.
Case question: A drug RCT finds:
- Treatment group attrition rate 30%, control group 15%
- Complete data shows ATE = 10
Questions:
- (a) Is this ATE credible? Why or why not?
- (b) How to respond?
Design question: You're designing an education RCT and worried about spillover effects (classmates influencing each other).
- (a) How would SUTVA be violated?
- (b) How to modify experimental design?
- (c) If you must randomize within classrooms, how to analyze data?

Click for answer hints

Question 1:

High internal validity: Causal inference correct within sample (e.g., rigorous RCT)
Low external validity: Sample doesn't represent population (e.g., experiment only at elite schools)
Example: Stanford online course experiment → Generalize to community colleges?

Question 2:

(a) Not credible! Large difference in attrition rates, likely differential attrition bias
(b) Use Lee Bounds to estimate effect bounds, or IPW method

Question 3:

(a) Treated students will influence control students (peer effects)
(b) Randomize by classroom (cluster randomization)
(c) Use Cluster-Robust standard errors

Next Steps

In the next section, we'll do Complete Python Practice, integrating all knowledge from this chapter to analyze a real RCT dataset.

Ready? 🚀

References:

Rubin, D. B. (1980). "Randomization analysis of experimental data: The Fisher randomization test comment". JASA.
Lee, D. S. (2009). "Training, wages, and sample selection: Estimating sharp bounds on treatment effects". Review of Economic Studies.
Cinelli, C., & Hazlett, C. (2020). "Making sense of sensitivity: Extending omitted variable bias". Journal of the Royal Statistical Society.
Manski, C. F. (1990). "Nonparametric bounds on treatment effects". American Economic Review.

2.5 Identification Strategies and Validity ​

Section Objectives ​

Core Assumptions of Causal Identification ​

What is "Identification"? ​

1️⃣ SUTVA Assumption ​

Definition ​

(1) No Interference ​

(2) No Hidden Variations of Treatment ​

Consequences of SUTVA Violation ​

Addressing SUTVA Violations ​

2️⃣ Independence Assumption (Unconfoundedness) ​

Independence Under Randomization ​

Conditional Independence ​

Testing Independence: Balance Checks ​

3️⃣ Internal Validity ​

Definition ​

Threats ​

1. Selection Bias ​

2. Attrition ​

3. Hawthorne Effect (Observer Effect) ​

4. Spillover Effects ​

5. John Henry Effect (opposite of Hawthorne) ​

4️⃣ External Validity ​

Definition ​

Threats ​

1. Sample Selection Bias ​

2. Experimental Environment vs Real Environment ​

3. Context-Dependent Treatment Effects ​

Methods to Improve External Validity ​

5️⃣ Sensitivity Analysis ​

Purpose ​

Method 1: Omitted Variable Bias Analysis ​

Method 2: Lee Bounds for Attrition ​

Method 3: Placebo Test ​

Identification Strategy Comparison ​

Summary ​

Key Points ​

Practice Questions ​

Next Steps ​

2.5 Identification Strategies and Validity

Section Objectives

Core Assumptions of Causal Identification

What is "Identification"?

1️⃣ SUTVA Assumption

Definition

(1) No Interference

(2) No Hidden Variations of Treatment

Consequences of SUTVA Violation

Addressing SUTVA Violations

2️⃣ Independence Assumption (Unconfoundedness)

Independence Under Randomization

Conditional Independence

Testing Independence: Balance Checks

3️⃣ Internal Validity

Definition

Threats

1. Selection Bias

2. Attrition

3. Hawthorne Effect (Observer Effect)

4. Spillover Effects

5. John Henry Effect (opposite of Hawthorne)

4️⃣ External Validity

Definition

Threats

1. Sample Selection Bias

2. Experimental Environment vs Real Environment

3. Context-Dependent Treatment Effects

Methods to Improve External Validity

5️⃣ Sensitivity Analysis

Purpose

Method 1: Omitted Variable Bias Analysis

Method 2: Lee Bounds for Attrition

Method 3: Placebo Test

Identification Strategy Comparison

Summary

Key Points

Practice Questions

Next Steps