2.3 Randomized Controlled Trials (RCTs)

"Randomization is the only method known to us that can be relied on to control for all relevant factors."— Abhijit Banerjee & Esther Duflo, 2019 Nobel Laureates in Economics

The Gold Standard of Causal Inference

Section Objectives

Understand how randomization eliminates selection bias
Master RCT design principles and types
Learn balance check methods
Understand advantages and limitations of RCT

The Magic of Randomization

Core Idea

RCT (Randomized Controlled Trial) = Randomized Controlled Trial

Key Operation: Decide who receives treatment through coin flip (random number generator)

Usually (treatment and control groups each 50%)

Why Does Randomization Work?

Law of Large Numbers + Independence → Balance

When sample size is large enough, random assignment ensures:

Meaning:

Both groups are on average the same across all characteristics
Including observable characteristics (age, education) and unobservable characteristics (ability, motivation)
The only difference between groups is whether they receive treatment

Simple Comparison is Unbiased Under RCT

Mathematical Proof

Under RCT, simple comparison:

Key Steps:

Line 2: Observation rule ()
Line 3: Randomization ensures
Line 4: Definition of ATE

Contrast: Without Randomization

Selection bias reappears:

RCT Advantage: Second term (selection bias) = 0

RCT Design Types

1. Simple Randomization

Method: Each individual independently randomly assigned

python

import numpy as np

n = 1000
treatment = np.random.binomial(1, 0.5, n)

Advantages:

Simplest
Theoretically completely eliminates bias

Disadvantages:

May be imbalanced with small samples (e.g., 480 treatment, 520 control)
May be imbalanced on covariates

2. Stratified Randomization

Method: First stratify by covariates, then randomize within each stratum

Case: Education experiment

Stratum 1 (Elite high schools): 500 people → 250 treatment, 250 control
Stratum 2 (Regular high schools): 300 people → 150 treatment, 150 control
Stratum 3 (Vocational schools): 200 people → 100 treatment, 100 control

Python Implementation:

python

import pandas as pd
import numpy as np

# Generate data
np.random.seed(42)
data = pd.DataFrame({
    'student_id': range(1000),
    'school_type': np.random.choice(['Elite', 'Regular', 'Vocational'], 1000, p=[0.5, 0.3, 0.2])
})

# Stratified randomization
def stratified_randomization(df, strata_col, prob=0.5):
    df['treatment'] = 0
    for stratum in df[strata_col].unique():
        mask = df[strata_col] == stratum
        n_stratum = mask.sum()
        df.loc[mask, 'treatment'] = np.random.binomial(1, prob, n_stratum)
    return df

data = stratified_randomization(data, 'school_type')

# Check balance
print(data.groupby('school_type')['treatment'].value_counts())

Advantages:

Ensures balance on important covariates
Increases statistical power

When to Use:

Known variables have large effects on outcomes (e.g., gender, age, baseline scores)
Want to ensure sufficient subsample sizes

3. Matched-Pair Randomization

Method: First match similar individuals, then randomize within each pair

Case: School experiment

Match 50 similar schools (by region, size, performance)
Within each pair, randomly assign one school to treatment, the other to control

Python Implementation:

python

# Assume 100 schools, paired two by two
n_pairs = 50
pairs = np.repeat(range(n_pairs), 2)  # [0,0,1,1,2,2,...]

# Randomize within each pair
treatment = np.zeros(100, dtype=int)
for pair_id in range(n_pairs):
    pair_indices = np.where(pairs == pair_id)[0]
    treatment[pair_indices[0]] = np.random.binomial(1, 0.5)
    treatment[pair_indices[1]] = 1 - treatment[pair_indices[0]]

print(f"Treatment group size: {treatment.sum()}")  # Should be exactly 50

Advantages:

Automatically balances covariates
Exactly equal sample sizes

Disadvantages:

Need to find suitable matches first
Analysis must account for paired structure (Cluster SE)

4. Cluster Randomization

Method: Randomize at the group level (e.g., classrooms, schools, villages)

Case: Education policy evaluation

Randomly select 50 schools to implement new policy
Another 50 schools as control group
Unit: Schools (not students)

Why Needed?

Spillover effects: Students in same class influence each other
Implementation feasibility: Cannot treat students differently within same school

Python Implementation:

python

# 100 schools
n_schools = 100
school_treatment = np.random.binomial(1, 0.5, n_schools)

# 30 students per school
students_per_school = 30
data = pd.DataFrame({
    'school_id': np.repeat(range(n_schools), students_per_school),
    'student_id': range(n_schools * students_per_school)
})

# Students inherit school treatment status
data['treatment'] = data['school_id'].map(
    dict(zip(range(n_schools), school_treatment))
)

print(data.groupby('school_id')['treatment'].first().value_counts())

Important Notes:

Standard errors need cluster adjustment (Cluster-Robust SE)
Effective sample size = number of clusters (not individuals)

Balance Checks

Why Needed?

Although randomization theoretically ensures balance, actual data may have imbalances due to:

Limited sample size
Random fluctuation
Implementation biases

Check Methods

1. Descriptive Statistics Comparison

python

import pandas as pd
from scipy import stats

# Simulate data
np.random.seed(42)
n = 200
data = pd.DataFrame({
    'treatment': np.random.binomial(1, 0.5, n),
    'age': np.random.normal(30, 5, n),
    'income': np.random.normal(50000, 15000, n),
    'education': np.random.randint(12, 22, n)
})

# Balance table
balance_table = data.groupby('treatment').agg({
    'age': ['mean', 'std'],
    'income': ['mean', 'std'],
    'education': ['mean', 'std']
}).T

balance_table.columns = ['Control', 'Treatment']
print(balance_table.round(2))

Output:

                Control  Treatment
age  mean        29.87      30.13
     std          5.12       4.88
income mean   49234.56   50876.23
       std    14987.45   15234.78
education mean   16.47      16.53
          std     2.89       2.91

2. t-Test (Continuous Variables)

python

def balance_test_continuous(data, var, treatment_col='treatment'):
    """
    Balance t-test for continuous variables
    """
    treated = data[data[treatment_col] == 1][var]
    control = data[data[treatment_col] == 0][var]

    t_stat, p_value = stats.ttest_ind(treated, control)

    return {
        'Variable': var,
        'Treatment Mean': treated.mean(),
        'Control Mean': control.mean(),
        'Difference': treated.mean() - control.mean(),
        't-statistic': t_stat,
        'p-value': p_value,
        'Balanced': '✓' if p_value > 0.05 else '✗'
    }

# Batch test
balance_results = []
for var in ['age', 'income', 'education']:
    balance_results.append(balance_test_continuous(data, var))

balance_df = pd.DataFrame(balance_results)
print(balance_df.round(3))

3. Chi-Square Test (Categorical Variables)

python

# Add categorical variables
data['gender'] = np.random.choice(['Male', 'Female'], n)
data['region'] = np.random.choice(['East', 'Central', 'West'], n)

def balance_test_categorical(data, var, treatment_col='treatment'):
    """
    Balance chi-square test for categorical variables
    """
    contingency_table = pd.crosstab(data[var], data[treatment_col])
    chi2, p_value, dof, expected = stats.chi2_contingency(contingency_table)

    return {
        'Variable': var,
        'Chi-square': chi2,
        'p-value': p_value,
        'Balanced': '✓' if p_value > 0.05 else '✗'
    }

# Batch test
cat_results = []
for var in ['gender', 'region']:
    cat_results.append(balance_test_categorical(data, var))

cat_df = pd.DataFrame(cat_results)
print(cat_df.round(3))

4. Joint F-Test

python

import statsmodels.api as sm

# Regression: treatment ~ age + income + education + ...
X = data[['age', 'income', 'education']]
X = sm.add_constant(X)
y = data['treatment']

model = sm.OLS(y, X).fit()
print(model.summary())

# F-test: H0: all coefficients = 0
print(f"\nF-statistic: {model.fvalue:.3f}")
print(f"Prob (F-statistic): {model.f_pvalue:.4f}")

if model.f_pvalue > 0.05:
    print("✓ Covariates jointly balanced (F-test not significant)")
else:
    print("✗ Covariates may be imbalanced (F-test significant)")

Interpreting Balance Checks

p-value	Conclusion	Action
> 0.1	✓ Very balanced	No adjustment needed
0.05-0.1	⚠️ Borderline	Can add control variables
< 0.05	✗ Imbalanced	Must add control variables or re-randomize

Important Reminder:

Balance check ≠ Hypothesis test
We should not reject null hypothesis (we want p > 0.05)
Multiple testing problem: Testing 20 variables, expect 1 with p < 0.05

RCT Estimation Methods

Method 1: Simple Difference

python

# Simplest ATE estimate
ATE_simple = data[data['treatment'] == 1]['outcome'].mean() - \
             data[data['treatment'] == 0]['outcome'].mean()

# Standard error (assuming homoscedasticity)
n1 = (data['treatment'] == 1).sum()
n0 = (data['treatment'] == 0).sum()
s1 = data[data['treatment'] == 1]['outcome'].std()
s0 = data[data['treatment'] == 0]['outcome'].std()

se_simple = np.sqrt(s1**2 / n1 + s0**2 / n0)

# 95% confidence interval
ci_lower = ATE_simple - 1.96 * se_simple
ci_upper = ATE_simple + 1.96 * se_simple

print(f"ATE: {ATE_simple:.2f}")
print(f"95% CI: [{ci_lower:.2f}, {ci_upper:.2f}]")

Method 2: Regression Estimation

python

import statsmodels.api as sm

X = sm.add_constant(data['treatment'])
y = data['outcome']

model = sm.OLS(y, X).fit()
print(model.summary())

# ATE = coefficient on treatment variable
ATE_reg = model.params['treatment']
se_reg = model.bse['treatment']

print(f"\nATE (regression): {ATE_reg:.2f}")
print(f"Standard error: {se_reg:.2f}")

Advantages:

Can add control variables (improve precision)
Heteroskedasticity-robust standard errors

Method 3: Regression + Control Variables

python

# Add control variables
X = data[['treatment', 'age', 'income', 'education']]
X = sm.add_constant(X)
y = data['outcome']

model_control = sm.OLS(y, X).fit(cov_type='HC3')  # Heteroskedasticity-robust SE
print(model_control.summary())

# Compare precision
print(f"\nWithout controls SE: {se_reg:.3f}")
print(f"With controls SE: {model_control.bse['treatment']:.3f}")
print(f"Precision improvement: {(1 - model_control.bse['treatment'] / se_reg) * 100:.1f}%")

Why Add Control Variables in RCT?

Under RCT, estimate is unbiased (with or without controls)
But adding controls can reduce standard errors (increase statistical power)

Advantages and Limitations of RCT

Advantages (Why the Gold Standard)

Advantage	Explanation
Strongest Internal Validity	Randomization eliminates all confounders
No Assumptions Needed	No functional form or unobservable assumptions
Simple and Transparent	Intuitive analysis, credible results
Heterogeneity Analysis	Easy to study subgroup effects

Limitations

Limitation	Explanation	Response
High Cost	Time, money, personnel	Consider quasi-experimental methods
Ethical Issues	Some treatments cannot be randomized (e.g., smoking)	Observational data + causal inference methods
External Validity	Experimental sample may not represent population	Multi-site replication
Hawthorne Effect	Participants change behavior when observed	Double-blind design
Attrition	Tracking difficulties lead to missing data	Intention-to-treat (ITT) analysis
Non-compliance	Assigned to treatment but didn't receive it	Instrumental variables (IV)

Classic RCT Cases

Case 1: PROGRESA (Mexico Conditional Cash Transfer)

Background:

1997 Mexican government anti-poverty program
Cash transfers to poor families conditional on children attending school and regular health checkups

Experimental Design:

Randomly selected 320 villages for immediate implementation
186 villages delayed 2 years (control group)
Unit: Villages (cluster randomization)

Results:

Child enrollment increased by 3.4 percentage points
Significant health improvements
Later scaled nationwide (renamed Oportunidades)

Key Lessons:

Policy rollout can use phase-in design for RCT
Control group only delayed not permanently deprived

Case 2: STAR Project (Tennessee Class Size)

Background:

1985-1989 study of class size effects on academic achievement

Experimental Design:

11,600 students randomly assigned to:
- Small class (13-17 students)
- Regular class (22-25 students)
- Regular class + teaching aide
Unit: Students (within-school randomization)

Results:

Small class students had significantly higher achievement
Effects persisted to high school and adulthood
Larger effects for disadvantaged groups

Data: Publicly available dataset, widely used for teaching

python

# Load STAR data (example)
# data = pd.read_csv('STAR_data.csv')
# ATE = data[data['small_class'] == 1]['test_score'].mean() - \
#       data[data['small_class'] == 0]['test_score'].mean()

Case 3: Oregon Health Insurance Experiment

Background:

2008 Oregon expanded Medicaid (healthcare assistance)
Limited slots allocated by lottery

Experimental Design:

Naturally formed RCT
Lottery winners could apply for Medicaid (treatment group)
Lottery losers as control group

Results:

Health insurance significantly reduced financial stress
Improved mental health
But effects on physical health indicators not significant (controversial)

Complete Python Practice: Simulating RCT

python

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import statsmodels.api as sm

# Set random seed
np.random.seed(42)

# ==== 1. Data Generation ====
n = 500

# Covariates
data = pd.DataFrame({
    'age': np.random.normal(30, 10, n),
    'income': np.random.lognormal(10.5, 0.5, n),
    'education': np.random.randint(9, 23, n)
})

# Random assignment (RCT)
data['treatment'] = np.random.binomial(1, 0.5, n)

# Outcome variable (simulate true causal effect = 1000)
data['Y0'] = (500 + 50 * data['age'] + 0.05 * data['income'] +
              100 * data['education'] + np.random.normal(0, 500, n))
data['Y1'] = data['Y0'] + 1000 + np.random.normal(0, 200, n)
data['Y_obs'] = np.where(data['treatment'] == 1, data['Y1'], data['Y0'])

# ==== 2. Balance Checks ====
print("=" * 50)
print("Balance Checks")
print("=" * 50)

for var in ['age', 'income', 'education']:
    treated = data[data['treatment'] == 1][var]
    control = data[data['treatment'] == 0][var]
    t_stat, p_val = stats.ttest_ind(treated, control)

    print(f"\n{var}:")
    print(f"  Treatment mean: {treated.mean():.2f}")
    print(f"  Control mean: {control.mean():.2f}")
    print(f"  Difference: {treated.mean() - control.mean():.2f}")
    print(f"  p-value: {p_val:.4f} {'✓' if p_val > 0.05 else '✗'}")

# ==== 3. ATE Estimation ====
print("\n" + "=" * 50)
print("ATE Estimation")
print("=" * 50)

# Method 1: Simple difference
ATE_simple = (data[data['treatment'] == 1]['Y_obs'].mean() -
              data[data['treatment'] == 0]['Y_obs'].mean())
print(f"\nSimple difference: {ATE_simple:.2f}")

# Method 2: Regression (no controls)
X = sm.add_constant(data['treatment'])
model1 = sm.OLS(data['Y_obs'], X).fit(cov_type='HC3')
print(f"\nRegression (no controls):")
print(f"  ATE: {model1.params['treatment']:.2f}")
print(f"  SE: {model1.bse['treatment']:.2f}")
print(f"  95% CI: [{model1.conf_int().loc['treatment', 0]:.2f}, "
      f"{model1.conf_int().loc['treatment', 1]:.2f}]")

# Method 3: Regression (with controls)
X_control = sm.add_constant(data[['treatment', 'age', 'income', 'education']])
model2 = sm.OLS(data['Y_obs'], X_control).fit(cov_type='HC3')
print(f"\nRegression (with controls):")
print(f"  ATE: {model2.params['treatment']:.2f}")
print(f"  SE: {model2.bse['treatment']:.2f}")
print(f"  95% CI: [{model2.conf_int().loc['treatment', 0]:.2f}, "
      f"{model2.conf_int().loc['treatment', 1]:.2f}]")
print(f"\nPrecision improvement: {(1 - model2.bse['treatment'] / model1.bse['treatment']) * 100:.1f}%")

# ==== 4. Visualization ====
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Plot 1: Treatment assignment
axes[0, 0].bar(['Control', 'Treatment'],
               data['treatment'].value_counts().sort_index(),
               color=['skyblue', 'salmon'])
axes[0, 0].set_ylabel('Sample Size')
axes[0, 0].set_title('Randomization Results')

# Plot 2: Outcome distribution comparison
axes[0, 1].hist(data[data['treatment'] == 0]['Y_obs'],
                bins=30, alpha=0.5, label='Control', color='skyblue')
axes[0, 1].hist(data[data['treatment'] == 1]['Y_obs'],
                bins=30, alpha=0.5, label='Treatment', color='salmon')
axes[0, 1].axvline(data[data['treatment'] == 0]['Y_obs'].mean(),
                   color='blue', linestyle='--', linewidth=2)
axes[0, 1].axvline(data[data['treatment'] == 1]['Y_obs'].mean(),
                   color='red', linestyle='--', linewidth=2)
axes[0, 1].set_xlabel('Outcome Variable')
axes[0, 1].set_ylabel('Frequency')
axes[0, 1].set_title('Outcome Distribution Comparison')
axes[0, 1].legend()

# Plot 3: Covariate balance (age)
axes[1, 0].boxplot([data[data['treatment'] == 0]['age'],
                    data[data['treatment'] == 1]['age']],
                   labels=['Control', 'Treatment'])
axes[1, 0].set_ylabel('Age')
axes[1, 0].set_title('Covariate Balance: Age')

# Plot 4: ATE estimate comparison
methods = ['Simple\nDiff', 'Regression\n(no controls)', 'Regression\n(with controls)']
estimates = [ATE_simple, model1.params['treatment'], model2.params['treatment']]
se = [0, model1.bse['treatment'], model2.bse['treatment']]

axes[1, 1].errorbar(methods, estimates, yerr=[1.96*s for s in se],
                    fmt='o', capsize=5, capthick=2, markersize=8)
axes[1, 1].axhline(1000, color='red', linestyle='--', label='True ATE')
axes[1, 1].set_ylabel('ATE Estimate')
axes[1, 1].set_title('ATE Estimates by Method')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('rct_analysis.png', dpi=300, bbox_inches='tight')
plt.show()

print("\n✓ RCT analysis complete!")

Summary

Core Points of RCT

Point	Content
Core Mechanism	Randomization → Independence → Balance → Unbiased estimation
Design Types	Simple/Stratified/Matched/Cluster randomization
Balance Checks	t-test, chi-square test, F-test
Estimation Methods	Simple difference, regression, regression + controls
Advantages	Strongest internal validity
Limitations	High cost, ethical issues, external validity

Key Insights

Randomization is the Ultimate Weapon of Causal Inference
- Eliminates all (observable and unobservable) confounders
- Simple comparison yields unbiased estimates
Balance Checks are Quality Assurance
- Not hypothesis testing (want non-significance)
- Must add controls if imbalanced
RCT is Not a Panacea
- Many important questions cannot use RCT (ethics, feasibility)
- Need to combine with quasi-experimental methods

Practice Questions

Design question: You want to study "the causal effect of remote work on employee productivity."
- (a) Design an RCT
- (b) Which type of randomization should you use? Why?
- (c) What implementation challenges might you face?
Analysis question: An RCT has 1000 people. After random assignment:
- Treatment group average age 35, control group 30 (p = 0.02)
- All other variables balanced (p > 0.1)
Questions:
- (a) Is this RCT valid?
- (b) How should you analyze the data?
Understanding question: Why does adding control variables in RCT not change unbiasedness of ATE estimate, but improve precision?

Click for answer hints

Question 1:

(b) Should use cluster randomization (by department), because spillover effects exist between employees in same department
(c) Challenges: Hawthorne effect, difficulty measuring productivity, attrition (employee turnover)

Question 2:

(a) RCT still valid, but age variable imbalanced
(b) Must control for age in regression

Question 3:

Control variables absorb part of outcome variable variance
Reduces residual standard deviation → reduces standard error → increases statistical power

Next Steps

In the next section, we'll deeply study Average Treatment Effects (ATE, ATT, LATE) and how to handle non-compliance in RCT.

Preview of Core Questions:

What's the difference between ATE, ATT, and LATE?
What is intention-to-treat (ITT) analysis?
How to use instrumental variables to handle non-compliance?

Keep going! 🚀

References:

Duflo, E., Glennerster, R., & Kremer, M. (2007). "Using randomization in development economics research: A toolkit". Handbook of Development Economics.
Angrist, J. D., & Pischke, J. S. (2008). Mostly Harmless Econometrics. Chapter 2.
Athey, S., & Imbens, G. W. (2017). "The econometrics of randomized experiments". Handbook of Field Experiments.

2.3 Randomized Controlled Trials (RCTs) ​

Section Objectives ​

The Magic of Randomization ​

Core Idea ​

Why Does Randomization Work? ​

Simple Comparison is Unbiased Under RCT ​

Mathematical Proof ​

Contrast: Without Randomization ​

RCT Design Types ​

1. Simple Randomization ​

2. Stratified Randomization ​

3. Matched-Pair Randomization ​

4. Cluster Randomization ​

Balance Checks ​

Why Needed? ​

Check Methods ​

1. Descriptive Statistics Comparison ​

2. t-Test (Continuous Variables) ​

3. Chi-Square Test (Categorical Variables) ​

4. Joint F-Test ​

Interpreting Balance Checks ​

RCT Estimation Methods ​

Method 1: Simple Difference ​

Method 2: Regression Estimation ​

Method 3: Regression + Control Variables ​

Advantages and Limitations of RCT ​

Advantages (Why the Gold Standard) ​

Limitations ​

Classic RCT Cases ​

Case 1: PROGRESA (Mexico Conditional Cash Transfer) ​

Case 2: STAR Project (Tennessee Class Size) ​

Case 3: Oregon Health Insurance Experiment ​

Complete Python Practice: Simulating RCT ​

Summary ​

Core Points of RCT ​

Key Insights ​

Practice Questions ​

Next Steps ​

2.3 Randomized Controlled Trials (RCTs)

Section Objectives

The Magic of Randomization

Core Idea

Why Does Randomization Work?

Simple Comparison is Unbiased Under RCT

Mathematical Proof

Contrast: Without Randomization

RCT Design Types

1. Simple Randomization

2. Stratified Randomization

3. Matched-Pair Randomization

4. Cluster Randomization

Balance Checks

Why Needed?

Check Methods

1. Descriptive Statistics Comparison

2. t-Test (Continuous Variables)

3. Chi-Square Test (Categorical Variables)

4. Joint F-Test

Interpreting Balance Checks

RCT Estimation Methods

Method 1: Simple Difference

Method 2: Regression Estimation

Method 3: Regression + Control Variables

Advantages and Limitations of RCT

Advantages (Why the Gold Standard)

Limitations

Classic RCT Cases

Case 1: PROGRESA (Mexico Conditional Cash Transfer)

Case 2: STAR Project (Tennessee Class Size)

Case 3: Oregon Health Insurance Experiment

Complete Python Practice: Simulating RCT

Summary

Core Points of RCT

Key Insights

Practice Questions

Next Steps