2.2 Potential Outcomes Framework
"Statistics is the science of inferring things we cannot see from things we can see."— Donald Rubin, 2016 DeGroot Prize Winner (awarded by the American Statistical Association)
The Foundation of Causal Inference: Understanding Counterfactual Logic
Section Objectives
- Understand the core idea of the Rubin Causal Model (RCM)
- Master the mathematical representation of potential outcomes
- Recognize the fundamental problem of causal inference
- Distinguish individual causal effects from average causal effects
Starting with a Question
Case: Effect of Education Training on Income
Research Question: Does attending job training increase income?
Observed Data:
| Name | Attended Training | Monthly Income (USD) |
|---|---|---|
| Zhang | Yes | 8,000 |
| Li | No | 6,000 |
| Wang | Yes | 9,000 |
| Zhao | No | 5,500 |
Simple Comparison:
- Training group average income: (8000 + 9000) / 2 = 8,500
- Non-training group average income: (6000 + 5500) / 2 = 5,750
- Difference: 8500 - 5750 = 2,750
Does this prove training causes a 2,750 increase in income?
❌ No! There is selection bias:
- People who attend training may already have higher ability (would earn more even without training)
- People who don't attend may have lower ability (training might have limited effect)
The Correct Causal Question:
- Zhang attended training and earns 8,000
- What if Zhang had not attended training? (counterfactual)
- The difference between these two is the causal effect of training on Zhang
Problem: We cannot observe "Zhang's income without training" (he already attended)
This is the Fundamental Problem of Causal Inference.
Potential Outcomes Framework
Core Idea
The Rubin Causal Model (1974) proposes: Each individual has a potential outcome under each treatment state
Notation
For individual :
| Symbol | Meaning | English |
|---|---|---|
| Treatment indicator variable | Treatment Indicator | |
| : Receives treatment | Treated | |
| : Does not receive treatment | Control | |
| Potential outcome under treatment | Potential Outcome under Treatment | |
| Potential outcome under control | Potential Outcome under Control | |
| Observed outcome | Observed Outcome |
Observation Rule (Switching Equation)
Meaning:
- If (receives treatment), we observe
- If (does not receive treatment), we observe
- Key: We can never observe both and simultaneously
Individual Treatment Effect
Definition
The causal effect for individual is defined as:
Meaning: The difference in outcomes for the same individual under two states
Case: Zhang's Causal Effect
Suppose we have "God's perspective" and know all potential outcomes:
| Name | Income with training | Income without training | Causal effect | Actual choice | Observed income |
|---|---|---|---|---|---|
| Zhang | 8,000 | 7,000 | +1,000 | 1 | 8,000 |
| Li | 7,500 | 6,000 | +1,500 | 0 | 6,000 |
| Wang | 9,000 | 8,500 | +500 | 1 | 9,000 |
| Zhao | 7,000 | 5,500 | +1,500 | 0 | 5,500 |
Observations:
- Zhang's true causal effect is +1,000 (not 8,000!)
- Each person's causal effect may differ (heterogeneity)
Real-World Problem:
- We can only observe the bold numbers
- Gray portions (counterfactuals) can never be observed
- This is the Fundamental Problem of Causal Inference
Average Treatment Effect (ATE)
Definition
Since we cannot identify individual causal effects, we estimate the Average Treatment Effect (ATE):
Meaning: The average of all individual causal effects in the population
Continuing Zhang's Example
Using the "God's perspective" data from the table above:
Interpretation: The average causal effect of training is +1,125
Selection Bias
Problem with Simple Comparison
Simple Difference in Means:
Comparison with True ATE:
- Simple comparison: 2,750
- True ATE: 1,125
- Bias: 2,750 - 1,125 = 1,625
Source of Selection Bias
The simple comparison can be decomposed as:
First term (ATT): Average causal effect for the treated group
Second term (Selection bias): Baseline difference between groups without treatment
Verification:
Conclusion:
- People who attend training already have higher ability (higher )
- Simple comparison overestimates the training effect
- Selection bias = 2,000 is caused by self-selection
Core Challenges in Causal Inference
The Fundamental Problem
Missing Data Problem:
| Individual | |||||
|---|---|---|---|---|---|
| Zhang | ? | 7,000 | ? | 1 | 8,000 |
| Li | 7,500 | ? | ? | 0 | 6,000 |
| Wang | ? | 8,500 | ? | 1 | 9,000 |
| Zhao | 7,000 | ? | ? | 0 | 5,500 |
50% of data is inherently missing (not randomly missing!)
Three Major Challenges
| Challenge | Description | Consequence |
|---|---|---|
| Selection Bias | Treatment and control groups differ inherently | Simple comparison is biased |
| Confounding | Third variable affects both treatment and outcome | Cannot isolate causation |
| Reverse Causality | Outcome affects treatment | Wrong causal direction |
Conditions for Causal Identification
Conditions for Ideal Experiment
To unbiasedly estimate ATE, we need:
Meaning: If neither group receives treatment, their average outcomes should be the same
Equivalent statement:
- Treatment assignment is independent of potential outcomes
- Denoted as:
How to Achieve This Condition?
Answer: Randomization!
- Decide who receives treatment by coin flip
- Ensures both groups are comparable on average across all characteristics
- Eliminates selection bias
The next section will explain RCT (Randomized Controlled Trials) in detail
Python Implementation: Understanding Potential Outcomes
Case: Simulating "God's Perspective" Data
import pandas as pd
import numpy as np
# Set random seed
np.random.seed(42)
# Generate potential outcomes (usually unobservable)
n = 1000
data = pd.DataFrame({
'id': range(n),
'ability': np.random.normal(100, 15, n), # Latent ability
})
# Potential outcomes (God's perspective)
data['Y0'] = 5000 + 50 * data['ability'] + np.random.normal(0, 1000, n)
data['Y1'] = data['Y0'] + 1500 + np.random.normal(0, 500, n) # Training effect +1500
# Individual causal effects
data['tau'] = data['Y1'] - data['Y0']
# True ATE
true_ATE = data['tau'].mean()
print(f"True ATE: {true_ATE:.2f}")
# Self-selection: High-ability individuals more likely to attend training
prob_treat = 1 / (1 + np.exp(-(data['ability'] - 100) / 10))
data['D'] = np.random.binomial(1, prob_treat)
# Observed outcome (fundamental problem: only observe one)
data['Y_obs'] = data['D'] * data['Y1'] + (1 - data['D']) * data['Y0']
# Simple comparison (biased!)
naive_estimate = (data[data['D'] == 1]['Y_obs'].mean() -
data[data['D'] == 0]['Y_obs'].mean())
print(f"Simple comparison: {naive_estimate:.2f}")
print(f"Bias: {naive_estimate - true_ATE:.2f}")Output:
True ATE: 1502.34
Simple comparison: 6847.91
Bias: 5345.57Conclusion: Due to selection bias, simple comparison severely overestimates the training effect!
Visualization: Potential Outcomes
import matplotlib.pyplot as plt
import seaborn as sns
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
# Plot 1: Potential outcomes distribution
axes[0].scatter(data['Y0'], data['Y1'], alpha=0.3, s=10)
axes[0].plot([3000, 12000], [3000, 12000], 'r--', label='45° line')
axes[0].set_xlabel('Y(0): Income without training')
axes[0].set_ylabel('Y(1): Income with training')
axes[0].set_title('Potential Outcomes (God\'s Perspective)')
axes[0].legend()
# Plot 2: Individual causal effects distribution
axes[1].hist(data['tau'], bins=50, edgecolor='black', alpha=0.7)
axes[1].axvline(true_ATE, color='red', linestyle='--',
label=f'Average effect = {true_ATE:.0f}')
axes[1].set_xlabel('Individual causal effect τ_i')
axes[1].set_ylabel('Frequency')
axes[1].set_title('Heterogeneity in Causal Effects')
axes[1].legend()
# Plot 3: Selection bias visualization
treated = data[data['D'] == 1]
control = data[data['D'] == 0]
axes[2].hist(treated['Y0'], bins=30, alpha=0.5, label='Treated group Y(0)', color='blue')
axes[2].hist(control['Y0'], bins=30, alpha=0.5, label='Control group Y(0)', color='orange')
axes[2].set_xlabel('Y(0): Income without training (counterfactual)')
axes[2].set_ylabel('Frequency')
axes[2].set_title('Selection Bias: Different Baselines')
axes[2].legend()
plt.tight_layout()
plt.show()Different Types of Average Effects
ATE vs ATT vs ATU
| Effect | Definition | Formula | Application Scenario |
|---|---|---|---|
| ATE | Population average effect | Universal policy rollout | |
| ATT | Treated group average effect | Evaluate participants | |
| ATU | Untreated group average effect | Evaluate non-participants |
When Does ATE = ATT?
Condition: Homogeneous Treatment Effect
Reality: Usually there is heterogeneity
- High-performing students may benefit more from training
- Drug effects may differ between healthy and sick populations
Summary
Core Concepts
| Concept | Meaning |
|---|---|
| Potential Outcomes | Possible outcomes for an individual under different treatment states |
| Counterfactual | State that didn't actually occur (unobservable) |
| Individual Causal Effect | |
| Average Causal Effect | |
| Selection Bias | Bias caused by baseline differences between groups |
| Fundamental Problem | Cannot observe both and simultaneously |
Key Insights
Correlation ≠ Causation
- Simple comparisons are usually biased
- Bias sources: selection, confounding, reverse causality
Goal of Causal Inference
- Estimate average causal effect (ATE)
- Need to construct comparable control groups
Solutions Preview
- Randomization (RCT): Most credible
- Quasi-experiments (DID, RDD, IV): Second-best options
- Matching methods (PSM): Require strong assumptions
Practice Questions
Conceptual question: Why do we say " and both exist, but we can only observe one"? How should we understand this statement?
Case question: Research finds "coffee drinkers live longer on average." Analyze using the potential outcomes framework:
- Define and
- What selection biases might exist?
- How would you design an RCT?
Calculation question: Assume the true data is as follows:
| Individual | |||
|---|---|---|---|
| A | 80 | 70 | 1 |
| B | 75 | 65 | 0 |
| C | 90 | 85 | 1 |
| D | 70 | 60 | 0 |
Calculate:
- (a) True ATE
- (b) ATT
- (c) Simple comparison estimate
- (d) Selection bias
Click to see answers
(a) ATE = [(80-70) + (75-65) + (90-85) + (70-60)] / 4 = [10+10+5+10] / 4 = 8.75
(b) ATT = [(80-70) + (90-85)] / 2 = [10+5] / 2 = 7.5
(c) Simple comparison = [(80+90)/2] - [(65+60)/2] = 85 - 62.5 = 22.5
(d) Selection bias = Simple comparison - ATE = 22.5 - 8.75 = 13.75
Or: Bias = E[Y(0)|D=1] - E[Y(0)|D=0] = [(70+85)/2] - [(65+60)/2] = 77.5 - 62.5 = 15
Next Steps
In the next section, we'll learn about Randomized Controlled Trials (RCT) and see how randomization solves the fundamental problem of causal inference.
Preview of Core Questions:
- Why does randomization eliminate selection bias?
- What are the experimental design principles of RCT?
- How to implement RCT analysis in Python?
Keep going! 🚀
References:
- Rubin, D. B. (1974). "Estimating causal effects of treatments in randomized and nonrandomized studies". Journal of Educational Psychology.
- Holland, P. W. (1986). "Statistics and Causal Inference". Journal of the American Statistical Association.
- Imbens, G. W., & Rubin, D. B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences. Cambridge University Press.