Skip to content

2.2 Potential Outcomes Framework

"Statistics is the science of inferring things we cannot see from things we can see."— Donald Rubin, 2016 DeGroot Prize Winner (awarded by the American Statistical Association)

The Foundation of Causal Inference: Understanding Counterfactual Logic


Section Objectives

  • Understand the core idea of the Rubin Causal Model (RCM)
  • Master the mathematical representation of potential outcomes
  • Recognize the fundamental problem of causal inference
  • Distinguish individual causal effects from average causal effects

Starting with a Question

Case: Effect of Education Training on Income

Research Question: Does attending job training increase income?

Observed Data:

NameAttended TrainingMonthly Income (USD)
ZhangYes8,000
LiNo6,000
WangYes9,000
ZhaoNo5,500

Simple Comparison:

  • Training group average income: (8000 + 9000) / 2 = 8,500
  • Non-training group average income: (6000 + 5500) / 2 = 5,750
  • Difference: 8500 - 5750 = 2,750

Does this prove training causes a 2,750 increase in income?

No! There is selection bias:

  • People who attend training may already have higher ability (would earn more even without training)
  • People who don't attend may have lower ability (training might have limited effect)

The Correct Causal Question:

  • Zhang attended training and earns 8,000
  • What if Zhang had not attended training? (counterfactual)
  • The difference between these two is the causal effect of training on Zhang

Problem: We cannot observe "Zhang's income without training" (he already attended)

This is the Fundamental Problem of Causal Inference.


Potential Outcomes Framework

Core Idea

The Rubin Causal Model (1974) proposes: Each individual has a potential outcome under each treatment state

Notation

For individual :

SymbolMeaningEnglish
Treatment indicator variableTreatment Indicator
: Receives treatmentTreated
: Does not receive treatmentControl
Potential outcome under treatmentPotential Outcome under Treatment
Potential outcome under controlPotential Outcome under Control
Observed outcomeObserved Outcome

Observation Rule (Switching Equation)

Meaning:

  • If (receives treatment), we observe
  • If (does not receive treatment), we observe
  • Key: We can never observe both and simultaneously

Individual Treatment Effect

Definition

The causal effect for individual is defined as:

Meaning: The difference in outcomes for the same individual under two states

Case: Zhang's Causal Effect

Suppose we have "God's perspective" and know all potential outcomes:

Name
Income with training

Income without training

Causal effect

Actual choice

Observed income
Zhang8,0007,000+1,00018,000
Li7,5006,000+1,50006,000
Wang9,0008,500+50019,000
Zhao7,0005,500+1,50005,500

Observations:

  • Zhang's true causal effect is +1,000 (not 8,000!)
  • Each person's causal effect may differ (heterogeneity)

Real-World Problem:

  • We can only observe the bold numbers
  • Gray portions (counterfactuals) can never be observed
  • This is the Fundamental Problem of Causal Inference

Average Treatment Effect (ATE)

Definition

Since we cannot identify individual causal effects, we estimate the Average Treatment Effect (ATE):

Meaning: The average of all individual causal effects in the population

Continuing Zhang's Example

Using the "God's perspective" data from the table above:

Interpretation: The average causal effect of training is +1,125


Selection Bias

Problem with Simple Comparison

Simple Difference in Means:

Comparison with True ATE:

  • Simple comparison: 2,750
  • True ATE: 1,125
  • Bias: 2,750 - 1,125 = 1,625

Source of Selection Bias

The simple comparison can be decomposed as:

First term (ATT): Average causal effect for the treated group

Second term (Selection bias): Baseline difference between groups without treatment

Verification:

Conclusion:

  • People who attend training already have higher ability (higher )
  • Simple comparison overestimates the training effect
  • Selection bias = 2,000 is caused by self-selection

Core Challenges in Causal Inference

The Fundamental Problem

Missing Data Problem:

Individual
Zhang?7,000?18,000
Li7,500??06,000
Wang?8,500?19,000
Zhao7,000??05,500

50% of data is inherently missing (not randomly missing!)

Three Major Challenges

ChallengeDescriptionConsequence
Selection BiasTreatment and control groups differ inherentlySimple comparison is biased
ConfoundingThird variable affects both treatment and outcomeCannot isolate causation
Reverse CausalityOutcome affects treatmentWrong causal direction

Conditions for Causal Identification

Conditions for Ideal Experiment

To unbiasedly estimate ATE, we need:

Meaning: If neither group receives treatment, their average outcomes should be the same

Equivalent statement:

  • Treatment assignment is independent of potential outcomes
  • Denoted as:

How to Achieve This Condition?

Answer: Randomization!

  • Decide who receives treatment by coin flip
  • Ensures both groups are comparable on average across all characteristics
  • Eliminates selection bias

The next section will explain RCT (Randomized Controlled Trials) in detail


Python Implementation: Understanding Potential Outcomes

Case: Simulating "God's Perspective" Data

python
import pandas as pd
import numpy as np

# Set random seed
np.random.seed(42)

# Generate potential outcomes (usually unobservable)
n = 1000
data = pd.DataFrame({
    'id': range(n),
    'ability': np.random.normal(100, 15, n),  # Latent ability
})

# Potential outcomes (God's perspective)
data['Y0'] = 5000 + 50 * data['ability'] + np.random.normal(0, 1000, n)
data['Y1'] = data['Y0'] + 1500 + np.random.normal(0, 500, n)  # Training effect +1500

# Individual causal effects
data['tau'] = data['Y1'] - data['Y0']

# True ATE
true_ATE = data['tau'].mean()
print(f"True ATE: {true_ATE:.2f}")

# Self-selection: High-ability individuals more likely to attend training
prob_treat = 1 / (1 + np.exp(-(data['ability'] - 100) / 10))
data['D'] = np.random.binomial(1, prob_treat)

# Observed outcome (fundamental problem: only observe one)
data['Y_obs'] = data['D'] * data['Y1'] + (1 - data['D']) * data['Y0']

# Simple comparison (biased!)
naive_estimate = (data[data['D'] == 1]['Y_obs'].mean() -
                  data[data['D'] == 0]['Y_obs'].mean())

print(f"Simple comparison: {naive_estimate:.2f}")
print(f"Bias: {naive_estimate - true_ATE:.2f}")

Output:

True ATE: 1502.34
Simple comparison: 6847.91
Bias: 5345.57

Conclusion: Due to selection bias, simple comparison severely overestimates the training effect!

Visualization: Potential Outcomes

python
import matplotlib.pyplot as plt
import seaborn as sns

fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# Plot 1: Potential outcomes distribution
axes[0].scatter(data['Y0'], data['Y1'], alpha=0.3, s=10)
axes[0].plot([3000, 12000], [3000, 12000], 'r--', label='45° line')
axes[0].set_xlabel('Y(0): Income without training')
axes[0].set_ylabel('Y(1): Income with training')
axes[0].set_title('Potential Outcomes (God\'s Perspective)')
axes[0].legend()

# Plot 2: Individual causal effects distribution
axes[1].hist(data['tau'], bins=50, edgecolor='black', alpha=0.7)
axes[1].axvline(true_ATE, color='red', linestyle='--',
                label=f'Average effect = {true_ATE:.0f}')
axes[1].set_xlabel('Individual causal effect τ_i')
axes[1].set_ylabel('Frequency')
axes[1].set_title('Heterogeneity in Causal Effects')
axes[1].legend()

# Plot 3: Selection bias visualization
treated = data[data['D'] == 1]
control = data[data['D'] == 0]

axes[2].hist(treated['Y0'], bins=30, alpha=0.5, label='Treated group Y(0)', color='blue')
axes[2].hist(control['Y0'], bins=30, alpha=0.5, label='Control group Y(0)', color='orange')
axes[2].set_xlabel('Y(0): Income without training (counterfactual)')
axes[2].set_ylabel('Frequency')
axes[2].set_title('Selection Bias: Different Baselines')
axes[2].legend()

plt.tight_layout()
plt.show()

Different Types of Average Effects

ATE vs ATT vs ATU

EffectDefinitionFormulaApplication Scenario
ATEPopulation average effectUniversal policy rollout
ATTTreated group average effectEvaluate participants
ATUUntreated group average effectEvaluate non-participants

When Does ATE = ATT?

Condition: Homogeneous Treatment Effect

Reality: Usually there is heterogeneity

  • High-performing students may benefit more from training
  • Drug effects may differ between healthy and sick populations

Summary

Core Concepts

ConceptMeaning
Potential OutcomesPossible outcomes for an individual under different treatment states
CounterfactualState that didn't actually occur (unobservable)
Individual Causal Effect
Average Causal Effect
Selection BiasBias caused by baseline differences between groups
Fundamental ProblemCannot observe both and simultaneously

Key Insights

  1. Correlation ≠ Causation

    • Simple comparisons are usually biased
    • Bias sources: selection, confounding, reverse causality
  2. Goal of Causal Inference

    • Estimate average causal effect (ATE)
    • Need to construct comparable control groups
  3. Solutions Preview

    • Randomization (RCT): Most credible
    • Quasi-experiments (DID, RDD, IV): Second-best options
    • Matching methods (PSM): Require strong assumptions

Practice Questions

  1. Conceptual question: Why do we say " and both exist, but we can only observe one"? How should we understand this statement?

  2. Case question: Research finds "coffee drinkers live longer on average." Analyze using the potential outcomes framework:

    • Define and
    • What selection biases might exist?
    • How would you design an RCT?
  3. Calculation question: Assume the true data is as follows:

Individual
A80701
B75650
C90851
D70600

Calculate:

  • (a) True ATE
  • (b) ATT
  • (c) Simple comparison estimate
  • (d) Selection bias
Click to see answers

(a) ATE = [(80-70) + (75-65) + (90-85) + (70-60)] / 4 = [10+10+5+10] / 4 = 8.75

(b) ATT = [(80-70) + (90-85)] / 2 = [10+5] / 2 = 7.5

(c) Simple comparison = [(80+90)/2] - [(65+60)/2] = 85 - 62.5 = 22.5

(d) Selection bias = Simple comparison - ATE = 22.5 - 8.75 = 13.75

Or: Bias = E[Y(0)|D=1] - E[Y(0)|D=0] = [(70+85)/2] - [(65+60)/2] = 77.5 - 62.5 = 15


Next Steps

In the next section, we'll learn about Randomized Controlled Trials (RCT) and see how randomization solves the fundamental problem of causal inference.

Preview of Core Questions:

  • Why does randomization eliminate selection bias?
  • What are the experimental design principles of RCT?
  • How to implement RCT analysis in Python?

Keep going! 🚀


References:

  • Rubin, D. B. (1974). "Estimating causal effects of treatments in randomized and nonrandomized studies". Journal of Educational Psychology.
  • Holland, P. W. (1986). "Statistics and Causal Inference". Journal of the American Statistical Association.
  • Imbens, G. W., & Rubin, D. B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences. Cambridge University Press.

Released under the MIT License. Content © Author.