2.5 Identification Strategies and Validity
"No causation without manipulation."— Paul Holland, Statistician
Core Assumptions and Threats in Causal Inference
Section Objectives
- Understand core assumptions of causal identification (SUTVA, independence)
- Master the distinction between internal and external validity
- Identify threats to RCT validity
- Learn sensitivity analysis methods
Core Assumptions of Causal Identification
What is "Identification"?
Causal identification: Whether we can uniquely determine the causal effect from observed data
Intuitive understanding:
- ❌ Not identifiable: Data is compatible with multiple different causal effects (cannot distinguish)
- ✅ Identifiable: Data is compatible with only one causal effect (can be estimated)
1️⃣ SUTVA Assumption
Definition
SUTVA (Stable Unit Treatment Value Assumption): Stable Unit Treatment Value Assumption
Contains two sub-assumptions:
(1) No Interference
Meaning: Individual 's potential outcome depends only on their own treatment status, not affected by others' treatment status
Cases:
- ✅ Satisfied: Drug trial (each person takes medicine independently)
- ❌ Violated: Vaccination (herd immunity effect), education policy (peer effects)
(2) No Hidden Variations of Treatment
Meaning: Treatment has only one form, no hidden variations
Cases:
❌ Violated: "Online course" may include:
- Live lectures (high interaction)
- Recorded lectures (low interaction)
- Self-study materials (no interaction)
Different forms of "online courses" may have different effects
Consequences of SUTVA Violation
import numpy as np
import pandas as pd
# Simulate spillover effects
np.random.seed(42)
n = 100
# Generate social network (neighbor relationships)
friends = np.random.randint(0, n, size=(n, 3)) # Each person has 3 friends
# Random assignment
data = pd.DataFrame({
'id': range(n),
'treatment': np.random.binomial(1, 0.5, n)
})
# Calculate proportion of friends receiving treatment
data['friends_treated'] = 0
for i in range(n):
friend_ids = friends[i]
data.loc[i, 'friends_treated'] = data.loc[friend_ids, 'treatment'].mean()
# Outcome variable (includes spillover effects)
# Both own treatment + friends' treatment have effects
data['Y'] = (100 +
30 * data['treatment'] + # Direct effect
15 * data['friends_treated'] + # Spillover effect
np.random.normal(0, 10, n))
# Simple comparison mixes direct and spillover effects
simple_diff = (data[data['treatment'] == 1]['Y'].mean() -
data[data['treatment'] == 0]['Y'].mean())
print(f"Simple comparison: {simple_diff:.2f}")
print("True direct effect: 30")
print("Spillover effect: 15 × (proportion of friends treated)")
print("\n⚠️ SUTVA violation → Simple comparison estimate is biased!")Addressing SUTVA Violations
| Method | Applicable Scenario | Example |
|---|---|---|
| Cluster randomization | Spillover effects limited within groups | Randomize by school |
| Two-stage experiments | Randomize groups and individuals | First random villages, then random villagers |
| Network experimental design | Known social network structure | Randomize non-adjacent nodes |
| Structural models | Estimate direct and spillover effects | Spatial econometric models |
2️⃣ Independence Assumption (Unconfoundedness)
Independence Under Randomization
Strong independence:
Meaning: Potential outcomes are statistically independent of treatment assignment
Core advantage of RCT: Randomization automatically satisfies this assumption
Conditional Independence
In observational studies (without randomization), we need a stronger assumption:
Meaning: Given covariates , treatment assignment is independent of potential outcomes
Common name: "Selection on Observables"
Testing Independence: Balance Checks
from scipy import stats
import pandas as pd
import numpy as np
def independence_test(data, covariates, treatment_col='treatment'):
"""
Test independence between treatment assignment and covariates
"""
results = []
for cov in covariates:
if data[cov].dtype in ['float64', 'int64']:
# Continuous variable: t-test
treated = data[data[treatment_col] == 1][cov]
control = data[data[treatment_col] == 0][cov]
stat, p_value = stats.ttest_ind(treated, control)
test_type = 't-test'
else:
# Categorical variable: chi-square test
contingency = pd.crosstab(data[cov], data[treatment_col])
stat, p_value, _, _ = stats.chi2_contingency(contingency)
test_type = 'chi2'
results.append({
'Covariate': cov,
'Test type': test_type,
'Statistic': stat,
'p-value': p_value,
'Balanced': '✓' if p_value > 0.05 else '✗'
})
return pd.DataFrame(results)
# Example
np.random.seed(42)
n = 200
data = pd.DataFrame({
'treatment': np.random.binomial(1, 0.5, n),
'age': np.random.normal(30, 10, n),
'income': np.random.lognormal(10, 1, n),
'gender': np.random.choice(['M', 'F'], n),
'education': np.random.choice(['High School', 'Bachelor', 'Graduate'], n)
})
balance_results = independence_test(
data,
covariates=['age', 'income', 'gender', 'education']
)
print(balance_results)3️⃣ Internal Validity
Definition
Internal validity: Whether causal inference is correct within the study sample
Meaning: Whether the estimator can unbiasedly and consistently estimate the causal effect within the sample
Threats
1. Selection Bias
Source: Treatment and control groups have non-comparable baselines
Situations that may occur in RCT:
- Randomization failure (technical issues)
- Sample size too small (random fluctuation)
- Improper stratification
Test: Balance checks (see above)
Response:
- Re-randomize
- Regression control for unbalanced variables
- Matching or IPW (inverse probability weighting)
2. Attrition
Definition: Some participants drop out, causing missing outcome data
# Simulate attrition
np.random.seed(42)
n = 500
data = pd.DataFrame({
'treatment': np.random.binomial(1, 0.5, n),
'baseline_health': np.random.normal(50, 10, n)
})
# Potential outcomes
data['Y0'] = 50 + 0.5 * data['baseline_health'] + np.random.normal(0, 5, n)
data['Y1'] = data['Y0'] + 10 + np.random.normal(0, 5, n)
# Attrition (people with poor health more likely to drop out)
attrition_prob = 1 / (1 + np.exp((data['baseline_health'] - 40) / 5))
data['attrited'] = np.random.binomial(1, attrition_prob)
# Observed outcome (only non-attrited have data)
data['Y_obs'] = np.where(data['treatment'] == 1, data['Y1'], data['Y0'])
data.loc[data['attrited'] == 1, 'Y_obs'] = np.nan
# Complete sample ATE
complete_ATE = data['Y1'].mean() - data['Y0'].mean()
# ATE with attrition
observed_data = data.dropna(subset=['Y_obs'])
attrited_ATE = (observed_data[observed_data['treatment'] == 1]['Y_obs'].mean() -
observed_data[observed_data['treatment'] == 0]['Y_obs'].mean())
print(f"Complete sample ATE: {complete_ATE:.2f}")
print(f"ATE with attrition: {attrited_ATE:.2f}")
print(f"Bias: {attrited_ATE - complete_ATE:.2f}")
# Attrition rate comparison
attrition_by_treatment = data.groupby('treatment')['attrited'].mean()
print("\nAttrition rates:")
print(attrition_by_treatment)Detection methods:
- Compare attrition rates between groups (should be similar)
- Check baseline characteristics of attrited vs retained
Response methods:
- Lee Bounds: Estimate upper and lower bounds of effect
- IPW: Inverse probability weighting (weight by retention probability)
- Sensitivity analysis: Assume different attrition mechanisms
3. Hawthorne Effect (Observer Effect)
Definition: Participants change behavior knowing they're being observed
Cases:
- Health research: Knowing they're monitored, pay more attention to diet and exercise
- Education experiments: Teachers and students try harder because they're being studied
Response:
- Double blind design: Neither participants nor researchers know assignment
- Placebo control group: Control group receives placebo treatment
- Unobtrusive measures: Use administrative data instead of surveys
4. Spillover Effects
See SUTVA section
5. John Henry Effect (opposite of Hawthorne)
Definition: Control group tries extra hard because they don't want to "lose"
Case: Control schools hear another school is piloting a new teaching method, competitively improve teaching quality
4️⃣ External Validity
Definition
External validity: Whether the causal effect can generalize to other populations
Threats
1. Sample Selection Bias
Problem: Experimental sample doesn't represent target population
Cases:
- College student sample → Generalize to society?
- Volunteer sample → Generalize to mandatory participation?
- Single region → Generalize nationwide?
# Simulate sample selection bias
np.random.seed(42)
# Population (N = 10,000)
population = pd.DataFrame({
'id': range(10000),
'ability': np.random.normal(100, 20, 10000)
})
# True ATE (in population)
population['tau'] = 10 + 0.1 * population['ability']
true_population_ATE = population['tau'].mean()
# Experimental sample (only high-ability people participate)
sample_prob = 1 / (1 + np.exp(-(population['ability'] - 100) / 10))
sample = population[np.random.binomial(1, sample_prob, 10000) == 1].copy()
# Conduct RCT in sample
sample['treatment'] = np.random.binomial(1, 0.5, len(sample))
sample['Y0'] = 50 + 0.5 * sample['ability']
sample['Y1'] = sample['Y0'] + sample['tau']
sample['Y_obs'] = np.where(sample['treatment'] == 1, sample['Y1'], sample['Y0'])
# Estimate ATE (in sample)
sample_ATE = (sample[sample['treatment'] == 1]['Y_obs'].mean() -
sample[sample['treatment'] == 0]['Y_obs'].mean())
print(f"Population ATE: {true_population_ATE:.2f}")
print(f"Sample ATE: {sample_ATE:.2f}")
print(f"External validity bias: {sample_ATE - true_population_ATE:.2f}")
print(f"\nPopulation average ability: {population['ability'].mean():.2f}")
print(f"Sample average ability: {sample['ability'].mean():.2f}")2. Experimental Environment vs Real Environment
Problem: Experimental conditions too idealized
Cases:
- Laboratory experiments → Real decision environment
- Small-scale pilot → Large-scale rollout
- Short-term effects → Long-term effects
3. Context-Dependent Treatment Effects
Problem: Effects depend on specific background conditions
Cases:
- Online education works well during pandemic → After pandemic?
- Policy effective during economic boom → During recession?
Methods to Improve External Validity
| Method | Description |
|---|---|
| Multi-site replication | Repeat in different regions, different times |
| Heterogeneity analysis | Study effects across subgroups, identify boundary conditions |
| Meta-analysis | Synthesize results from multiple studies |
| Re-weighting | Weight sample by population distribution |
# Re-weighting to improve external validity
from sklearn.linear_model import LogisticRegression
# 1. Estimate sample selection probability (propensity score)
X = population[['ability']]
y = np.isin(population['id'], sample['id']).astype(int)
ps_model = LogisticRegression()
ps_model.fit(X, y)
population['ps'] = ps_model.predict_proba(X)[:, 1]
# 2. Calculate weights
sample_with_weights = population[population['id'].isin(sample['id'])].copy()
sample_with_weights['weight'] = 1 / sample_with_weights['ps']
# 3. Weighted ATE estimation
# (Simplified here, actually needs to combine with treatment assignment)
weighted_mean_tau = (sample_with_weights['tau'] * sample_with_weights['weight']).sum() / sample_with_weights['weight'].sum()
print(f"Unweighted ATE: {sample['tau'].mean():.2f}")
print(f"Weighted ATE: {weighted_mean_tau:.2f}")
print(f"Population ATE: {true_population_ATE:.2f}")5️⃣ Sensitivity Analysis
Purpose
Assess robustness of results to assumption violations
Method 1: Omitted Variable Bias Analysis
Question: If unobserved confounders exist, how much would results change?
def sensitivity_analysis_omitted_variable(data, treatment, outcome, r2_confounder_treatment, r2_confounder_outcome):
"""
Omitted variable sensitivity analysis
Parameters:
- r2_confounder_treatment: Proportion of treatment variation explained by omitted variable
- r2_confounder_outcome: Proportion of outcome variation explained by omitted variable
"""
import statsmodels.api as sm
# Observed ATE
X = sm.add_constant(data[treatment])
model = sm.OLS(data[outcome], X).fit()
observed_ATE = model.params[treatment]
# Calculate bias
# Simplified formula (Cinelli & Hazlett, 2020)
bias = np.sqrt(r2_confounder_treatment * r2_confounder_outcome) * data[outcome].std()
# Adjusted estimates
adjusted_ATE_upper = observed_ATE + bias
adjusted_ATE_lower = observed_ATE - bias
return {
'Observed ATE': observed_ATE,
'Bias upper bound': bias,
'Adjusted ATE upper bound': adjusted_ATE_upper,
'Adjusted ATE lower bound': adjusted_ATE_lower
}
# Example
result = sensitivity_analysis_omitted_variable(
sample,
treatment='treatment',
outcome='Y_obs',
r2_confounder_treatment=0.1, # Omitted variable explains 10% of treatment variation
r2_confounder_outcome=0.2 # Omitted variable explains 20% of outcome variation
)
print("Sensitivity analysis results:")
for k, v in result.items():
print(f" {k}: {v:.2f}")Method 2: Lee Bounds for Attrition
def lee_bounds(data, treatment, outcome, attrited):
"""
Lee (2009) bounds estimation
Handle selection bias from attrition
"""
# Attrition rates
attrition_treated = data[data[treatment] == 1][attrited].mean()
attrition_control = data[data[treatment] == 0][attrited].mean()
# Calculate trimming proportion
trim_prop = abs(attrition_treated - attrition_control)
# Get non-attrited sample
observed = data[data[attrited] == 0].copy()
# Upper bound: Trim high values from treatment group
if attrition_treated > attrition_control:
treated_obs = observed[observed[treatment] == 1][outcome].sort_values(ascending=False)
n_trim = int(len(treated_obs) * trim_prop / (1 - attrition_treated))
treated_trimmed_upper = treated_obs.iloc[n_trim:]
upper_bound = treated_trimmed_upper.mean() - observed[observed[treatment] == 0][outcome].mean()
else:
upper_bound = np.nan
# Lower bound: Trim low values from treatment group
if attrition_treated > attrition_control:
treated_trimmed_lower = treated_obs.iloc[:-n_trim] if n_trim > 0 else treated_obs
lower_bound = treated_trimmed_lower.mean() - observed[observed[treatment] == 0][outcome].mean()
else:
lower_bound = np.nan
# Simple estimate (no trimming)
simple_ATE = (observed[observed[treatment] == 1][outcome].mean() -
observed[observed[treatment] == 0][outcome].mean())
return {
'Simple estimate': simple_ATE,
'Lee lower bound': lower_bound,
'Lee upper bound': upper_bound
}
# Use previous attrition data
bounds = lee_bounds(data, 'treatment', 'Y_obs', 'attrited')
print("\nLee Bounds:")
for k, v in bounds.items():
if not np.isnan(v):
print(f" {k}: {v:.2f}")Method 3: Placebo Test
def placebo_test(data, treatment, outcome, placebo_treatment):
"""
Placebo test: Use a "fake treatment" that should have no effect
"""
import statsmodels.api as sm
# Real treatment effect
X_real = sm.add_constant(data[treatment])
model_real = sm.OLS(data[outcome], X_real).fit()
real_effect = model_real.params[treatment]
real_pvalue = model_real.pvalues[treatment]
# Placebo treatment "effect"
X_placebo = sm.add_constant(data[placebo_treatment])
model_placebo = sm.OLS(data[outcome], X_placebo).fit()
placebo_effect = model_placebo.params[placebo_treatment]
placebo_pvalue = model_placebo.pvalues[placebo_treatment]
print("Placebo test:")
print(f" Real treatment effect: {real_effect:.2f} (p={real_pvalue:.4f})")
print(f" Placebo effect: {placebo_effect:.2f} (p={placebo_pvalue:.4f})")
if placebo_pvalue > 0.05:
print(" ✓ Placebo not significant, passes test")
else:
print(" ✗ Placebo significant, possible issues (e.g., selection bias)")
# Example: Use lagged treatment variable as placebo
sample['placebo_treatment'] = np.roll(sample['treatment'], 1)
placebo_test(sample, 'treatment', 'Y_obs', 'placebo_treatment')Identification Strategy Comparison
| Strategy | Identification Source | Core Assumption | Internal Validity | External Validity | Common Threats |
|---|---|---|---|---|---|
| RCT | Random assignment | SUTVA | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | Attrition, Hawthorne |
| DID | Parallel trends | No differential trends | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | Trend violations, reverse causality |
| RDD | Continuity | Continuity assumption | ⭐⭐⭐⭐ | ⭐⭐ | Manipulation, functional form |
| IV | Exogenous shock | Exclusion restriction | ⭐⭐⭐ | ⭐⭐⭐ | Weak instruments |
| PSM | Selection on observables | Conditional independence | ⭐⭐ | ⭐⭐⭐ | Hidden bias |
Summary
Key Points
SUTVA Assumption
- No spillover effects
- Treatment consistency
- Violation → Cluster randomization
Independence Assumption
- RCT automatically satisfies
- Balance checks verify
Internal Validity
- Whether causal inference is correct within sample
- Threats: Selection bias, attrition, Hawthorne
External Validity
- Whether generalizable to other populations
- Improve: Multi-site, heterogeneity analysis, re-weighting
Sensitivity Analysis
- Assess impact of assumption violations
- Methods: Omitted variables, Lee Bounds, Placebo
Practice Questions
Understanding question: Explain what "high internal validity but low external validity" means, with examples.
Case question: A drug RCT finds:
- Treatment group attrition rate 30%, control group 15%
- Complete data shows ATE = 10
Questions:
- (a) Is this ATE credible? Why or why not?
- (b) How to respond?
Design question: You're designing an education RCT and worried about spillover effects (classmates influencing each other).
- (a) How would SUTVA be violated?
- (b) How to modify experimental design?
- (c) If you must randomize within classrooms, how to analyze data?
Click for answer hints
Question 1:
- High internal validity: Causal inference correct within sample (e.g., rigorous RCT)
- Low external validity: Sample doesn't represent population (e.g., experiment only at elite schools)
- Example: Stanford online course experiment → Generalize to community colleges?
Question 2:
- (a) Not credible! Large difference in attrition rates, likely differential attrition bias
- (b) Use Lee Bounds to estimate effect bounds, or IPW method
Question 3:
- (a) Treated students will influence control students (peer effects)
- (b) Randomize by classroom (cluster randomization)
- (c) Use Cluster-Robust standard errors
Next Steps
In the next section, we'll do Complete Python Practice, integrating all knowledge from this chapter to analyze a real RCT dataset.
Ready? 🚀
References:
- Rubin, D. B. (1980). "Randomization analysis of experimental data: The Fisher randomization test comment". JASA.
- Lee, D. S. (2009). "Training, wages, and sample selection: Estimating sharp bounds on treatment effects". Review of Economic Studies.
- Cinelli, C., & Hazlett, C. (2020). "Making sense of sensitivity: Extending omitted variable bias". Journal of the Royal Statistical Society.
- Manski, C. F. (1990). "Nonparametric bounds on treatment effects". American Economic Review.