Skip to content

2.3 Randomized Controlled Trials (RCTs)

"Randomization is the only method known to us that can be relied on to control for all relevant factors."— Abhijit Banerjee & Esther Duflo, 2019 Nobel Laureates in Economics

The Gold Standard of Causal Inference


Section Objectives

  • Understand how randomization eliminates selection bias
  • Master RCT design principles and types
  • Learn balance check methods
  • Understand advantages and limitations of RCT

The Magic of Randomization

Core Idea

RCT (Randomized Controlled Trial) = Randomized Controlled Trial

Key Operation: Decide who receives treatment through coin flip (random number generator)

Usually (treatment and control groups each 50%)

Why Does Randomization Work?

Law of Large Numbers + IndependenceBalance

When sample size is large enough, random assignment ensures:

Meaning:

  • Both groups are on average the same across all characteristics
  • Including observable characteristics (age, education) and unobservable characteristics (ability, motivation)
  • The only difference between groups is whether they receive treatment

Simple Comparison is Unbiased Under RCT

Mathematical Proof

Under RCT, simple comparison:

Key Steps:

  • Line 2: Observation rule ()
  • Line 3: Randomization ensures
  • Line 4: Definition of ATE

Contrast: Without Randomization

Selection bias reappears:

RCT Advantage: Second term (selection bias) = 0


RCT Design Types

1. Simple Randomization

Method: Each individual independently randomly assigned

python
import numpy as np

n = 1000
treatment = np.random.binomial(1, 0.5, n)

Advantages:

  • Simplest
  • Theoretically completely eliminates bias

Disadvantages:

  • May be imbalanced with small samples (e.g., 480 treatment, 520 control)
  • May be imbalanced on covariates

2. Stratified Randomization

Method: First stratify by covariates, then randomize within each stratum

Case: Education experiment

Stratum 1 (Elite high schools): 500 people → 250 treatment, 250 control
Stratum 2 (Regular high schools): 300 people → 150 treatment, 150 control
Stratum 3 (Vocational schools): 200 people → 100 treatment, 100 control

Python Implementation:

python
import pandas as pd
import numpy as np

# Generate data
np.random.seed(42)
data = pd.DataFrame({
    'student_id': range(1000),
    'school_type': np.random.choice(['Elite', 'Regular', 'Vocational'], 1000, p=[0.5, 0.3, 0.2])
})

# Stratified randomization
def stratified_randomization(df, strata_col, prob=0.5):
    df['treatment'] = 0
    for stratum in df[strata_col].unique():
        mask = df[strata_col] == stratum
        n_stratum = mask.sum()
        df.loc[mask, 'treatment'] = np.random.binomial(1, prob, n_stratum)
    return df

data = stratified_randomization(data, 'school_type')

# Check balance
print(data.groupby('school_type')['treatment'].value_counts())

Advantages:

  • Ensures balance on important covariates
  • Increases statistical power

When to Use:

  • Known variables have large effects on outcomes (e.g., gender, age, baseline scores)
  • Want to ensure sufficient subsample sizes

3. Matched-Pair Randomization

Method: First match similar individuals, then randomize within each pair

Case: School experiment

  • Match 50 similar schools (by region, size, performance)
  • Within each pair, randomly assign one school to treatment, the other to control

Python Implementation:

python
# Assume 100 schools, paired two by two
n_pairs = 50
pairs = np.repeat(range(n_pairs), 2)  # [0,0,1,1,2,2,...]

# Randomize within each pair
treatment = np.zeros(100, dtype=int)
for pair_id in range(n_pairs):
    pair_indices = np.where(pairs == pair_id)[0]
    treatment[pair_indices[0]] = np.random.binomial(1, 0.5)
    treatment[pair_indices[1]] = 1 - treatment[pair_indices[0]]

print(f"Treatment group size: {treatment.sum()}")  # Should be exactly 50

Advantages:

  • Automatically balances covariates
  • Exactly equal sample sizes

Disadvantages:

  • Need to find suitable matches first
  • Analysis must account for paired structure (Cluster SE)

4. Cluster Randomization

Method: Randomize at the group level (e.g., classrooms, schools, villages)

Case: Education policy evaluation

  • Randomly select 50 schools to implement new policy
  • Another 50 schools as control group
  • Unit: Schools (not students)

Why Needed?

  • Spillover effects: Students in same class influence each other
  • Implementation feasibility: Cannot treat students differently within same school

Python Implementation:

python
# 100 schools
n_schools = 100
school_treatment = np.random.binomial(1, 0.5, n_schools)

# 30 students per school
students_per_school = 30
data = pd.DataFrame({
    'school_id': np.repeat(range(n_schools), students_per_school),
    'student_id': range(n_schools * students_per_school)
})

# Students inherit school treatment status
data['treatment'] = data['school_id'].map(
    dict(zip(range(n_schools), school_treatment))
)

print(data.groupby('school_id')['treatment'].first().value_counts())

Important Notes:

  • Standard errors need cluster adjustment (Cluster-Robust SE)
  • Effective sample size = number of clusters (not individuals)

Balance Checks

Why Needed?

Although randomization theoretically ensures balance, actual data may have imbalances due to:

  • Limited sample size
  • Random fluctuation
  • Implementation biases

Check Methods

1. Descriptive Statistics Comparison

python
import pandas as pd
from scipy import stats

# Simulate data
np.random.seed(42)
n = 200
data = pd.DataFrame({
    'treatment': np.random.binomial(1, 0.5, n),
    'age': np.random.normal(30, 5, n),
    'income': np.random.normal(50000, 15000, n),
    'education': np.random.randint(12, 22, n)
})

# Balance table
balance_table = data.groupby('treatment').agg({
    'age': ['mean', 'std'],
    'income': ['mean', 'std'],
    'education': ['mean', 'std']
}).T

balance_table.columns = ['Control', 'Treatment']
print(balance_table.round(2))

Output:

                Control  Treatment
age  mean        29.87      30.13
     std          5.12       4.88
income mean   49234.56   50876.23
       std    14987.45   15234.78
education mean   16.47      16.53
          std     2.89       2.91

2. t-Test (Continuous Variables)

python
def balance_test_continuous(data, var, treatment_col='treatment'):
    """
    Balance t-test for continuous variables
    """
    treated = data[data[treatment_col] == 1][var]
    control = data[data[treatment_col] == 0][var]

    t_stat, p_value = stats.ttest_ind(treated, control)

    return {
        'Variable': var,
        'Treatment Mean': treated.mean(),
        'Control Mean': control.mean(),
        'Difference': treated.mean() - control.mean(),
        't-statistic': t_stat,
        'p-value': p_value,
        'Balanced': '✓' if p_value > 0.05 else '✗'
    }

# Batch test
balance_results = []
for var in ['age', 'income', 'education']:
    balance_results.append(balance_test_continuous(data, var))

balance_df = pd.DataFrame(balance_results)
print(balance_df.round(3))

3. Chi-Square Test (Categorical Variables)

python
# Add categorical variables
data['gender'] = np.random.choice(['Male', 'Female'], n)
data['region'] = np.random.choice(['East', 'Central', 'West'], n)

def balance_test_categorical(data, var, treatment_col='treatment'):
    """
    Balance chi-square test for categorical variables
    """
    contingency_table = pd.crosstab(data[var], data[treatment_col])
    chi2, p_value, dof, expected = stats.chi2_contingency(contingency_table)

    return {
        'Variable': var,
        'Chi-square': chi2,
        'p-value': p_value,
        'Balanced': '✓' if p_value > 0.05 else '✗'
    }

# Batch test
cat_results = []
for var in ['gender', 'region']:
    cat_results.append(balance_test_categorical(data, var))

cat_df = pd.DataFrame(cat_results)
print(cat_df.round(3))

4. Joint F-Test

python
import statsmodels.api as sm

# Regression: treatment ~ age + income + education + ...
X = data[['age', 'income', 'education']]
X = sm.add_constant(X)
y = data['treatment']

model = sm.OLS(y, X).fit()
print(model.summary())

# F-test: H0: all coefficients = 0
print(f"\nF-statistic: {model.fvalue:.3f}")
print(f"Prob (F-statistic): {model.f_pvalue:.4f}")

if model.f_pvalue > 0.05:
    print("✓ Covariates jointly balanced (F-test not significant)")
else:
    print("✗ Covariates may be imbalanced (F-test significant)")

Interpreting Balance Checks

p-valueConclusionAction
> 0.1✓ Very balancedNo adjustment needed
0.05-0.1⚠️ BorderlineCan add control variables
< 0.05✗ ImbalancedMust add control variables or re-randomize

Important Reminder:

  • Balance check ≠ Hypothesis test
  • We should not reject null hypothesis (we want p > 0.05)
  • Multiple testing problem: Testing 20 variables, expect 1 with p < 0.05

RCT Estimation Methods

Method 1: Simple Difference

python
# Simplest ATE estimate
ATE_simple = data[data['treatment'] == 1]['outcome'].mean() - \
             data[data['treatment'] == 0]['outcome'].mean()

# Standard error (assuming homoscedasticity)
n1 = (data['treatment'] == 1).sum()
n0 = (data['treatment'] == 0).sum()
s1 = data[data['treatment'] == 1]['outcome'].std()
s0 = data[data['treatment'] == 0]['outcome'].std()

se_simple = np.sqrt(s1**2 / n1 + s0**2 / n0)

# 95% confidence interval
ci_lower = ATE_simple - 1.96 * se_simple
ci_upper = ATE_simple + 1.96 * se_simple

print(f"ATE: {ATE_simple:.2f}")
print(f"95% CI: [{ci_lower:.2f}, {ci_upper:.2f}]")

Method 2: Regression Estimation

python
import statsmodels.api as sm

X = sm.add_constant(data['treatment'])
y = data['outcome']

model = sm.OLS(y, X).fit()
print(model.summary())

# ATE = coefficient on treatment variable
ATE_reg = model.params['treatment']
se_reg = model.bse['treatment']

print(f"\nATE (regression): {ATE_reg:.2f}")
print(f"Standard error: {se_reg:.2f}")

Advantages:

  • Can add control variables (improve precision)
  • Heteroskedasticity-robust standard errors

Method 3: Regression + Control Variables

python
# Add control variables
X = data[['treatment', 'age', 'income', 'education']]
X = sm.add_constant(X)
y = data['outcome']

model_control = sm.OLS(y, X).fit(cov_type='HC3')  # Heteroskedasticity-robust SE
print(model_control.summary())

# Compare precision
print(f"\nWithout controls SE: {se_reg:.3f}")
print(f"With controls SE: {model_control.bse['treatment']:.3f}")
print(f"Precision improvement: {(1 - model_control.bse['treatment'] / se_reg) * 100:.1f}%")

Why Add Control Variables in RCT?

  • Under RCT, estimate is unbiased (with or without controls)
  • But adding controls can reduce standard errors (increase statistical power)

Advantages and Limitations of RCT

Advantages (Why the Gold Standard)

AdvantageExplanation
Strongest Internal ValidityRandomization eliminates all confounders
No Assumptions NeededNo functional form or unobservable assumptions
Simple and TransparentIntuitive analysis, credible results
Heterogeneity AnalysisEasy to study subgroup effects

Limitations

LimitationExplanationResponse
High CostTime, money, personnelConsider quasi-experimental methods
Ethical IssuesSome treatments cannot be randomized (e.g., smoking)Observational data + causal inference methods
External ValidityExperimental sample may not represent populationMulti-site replication
Hawthorne EffectParticipants change behavior when observedDouble-blind design
AttritionTracking difficulties lead to missing dataIntention-to-treat (ITT) analysis
Non-complianceAssigned to treatment but didn't receive itInstrumental variables (IV)

Classic RCT Cases

Case 1: PROGRESA (Mexico Conditional Cash Transfer)

Background:

  • 1997 Mexican government anti-poverty program
  • Cash transfers to poor families conditional on children attending school and regular health checkups

Experimental Design:

  • Randomly selected 320 villages for immediate implementation
  • 186 villages delayed 2 years (control group)
  • Unit: Villages (cluster randomization)

Results:

  • Child enrollment increased by 3.4 percentage points
  • Significant health improvements
  • Later scaled nationwide (renamed Oportunidades)

Key Lessons:

  • Policy rollout can use phase-in design for RCT
  • Control group only delayed not permanently deprived

Case 2: STAR Project (Tennessee Class Size)

Background:

  • 1985-1989 study of class size effects on academic achievement

Experimental Design:

  • 11,600 students randomly assigned to:
    • Small class (13-17 students)
    • Regular class (22-25 students)
    • Regular class + teaching aide
  • Unit: Students (within-school randomization)

Results:

  • Small class students had significantly higher achievement
  • Effects persisted to high school and adulthood
  • Larger effects for disadvantaged groups

Data: Publicly available dataset, widely used for teaching

python
# Load STAR data (example)
# data = pd.read_csv('STAR_data.csv')
# ATE = data[data['small_class'] == 1]['test_score'].mean() - \
#       data[data['small_class'] == 0]['test_score'].mean()

Case 3: Oregon Health Insurance Experiment

Background:

  • 2008 Oregon expanded Medicaid (healthcare assistance)
  • Limited slots allocated by lottery

Experimental Design:

  • Naturally formed RCT
  • Lottery winners could apply for Medicaid (treatment group)
  • Lottery losers as control group

Results:

  • Health insurance significantly reduced financial stress
  • Improved mental health
  • But effects on physical health indicators not significant (controversial)

Complete Python Practice: Simulating RCT

python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import statsmodels.api as sm

# Set random seed
np.random.seed(42)

# ==== 1. Data Generation ====
n = 500

# Covariates
data = pd.DataFrame({
    'age': np.random.normal(30, 10, n),
    'income': np.random.lognormal(10.5, 0.5, n),
    'education': np.random.randint(9, 23, n)
})

# Random assignment (RCT)
data['treatment'] = np.random.binomial(1, 0.5, n)

# Outcome variable (simulate true causal effect = 1000)
data['Y0'] = (500 + 50 * data['age'] + 0.05 * data['income'] +
              100 * data['education'] + np.random.normal(0, 500, n))
data['Y1'] = data['Y0'] + 1000 + np.random.normal(0, 200, n)
data['Y_obs'] = np.where(data['treatment'] == 1, data['Y1'], data['Y0'])

# ==== 2. Balance Checks ====
print("=" * 50)
print("Balance Checks")
print("=" * 50)

for var in ['age', 'income', 'education']:
    treated = data[data['treatment'] == 1][var]
    control = data[data['treatment'] == 0][var]
    t_stat, p_val = stats.ttest_ind(treated, control)

    print(f"\n{var}:")
    print(f"  Treatment mean: {treated.mean():.2f}")
    print(f"  Control mean: {control.mean():.2f}")
    print(f"  Difference: {treated.mean() - control.mean():.2f}")
    print(f"  p-value: {p_val:.4f} {'✓' if p_val > 0.05 else '✗'}")

# ==== 3. ATE Estimation ====
print("\n" + "=" * 50)
print("ATE Estimation")
print("=" * 50)

# Method 1: Simple difference
ATE_simple = (data[data['treatment'] == 1]['Y_obs'].mean() -
              data[data['treatment'] == 0]['Y_obs'].mean())
print(f"\nSimple difference: {ATE_simple:.2f}")

# Method 2: Regression (no controls)
X = sm.add_constant(data['treatment'])
model1 = sm.OLS(data['Y_obs'], X).fit(cov_type='HC3')
print(f"\nRegression (no controls):")
print(f"  ATE: {model1.params['treatment']:.2f}")
print(f"  SE: {model1.bse['treatment']:.2f}")
print(f"  95% CI: [{model1.conf_int().loc['treatment', 0]:.2f}, "
      f"{model1.conf_int().loc['treatment', 1]:.2f}]")

# Method 3: Regression (with controls)
X_control = sm.add_constant(data[['treatment', 'age', 'income', 'education']])
model2 = sm.OLS(data['Y_obs'], X_control).fit(cov_type='HC3')
print(f"\nRegression (with controls):")
print(f"  ATE: {model2.params['treatment']:.2f}")
print(f"  SE: {model2.bse['treatment']:.2f}")
print(f"  95% CI: [{model2.conf_int().loc['treatment', 0]:.2f}, "
      f"{model2.conf_int().loc['treatment', 1]:.2f}]")
print(f"\nPrecision improvement: {(1 - model2.bse['treatment'] / model1.bse['treatment']) * 100:.1f}%")

# ==== 4. Visualization ====
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Plot 1: Treatment assignment
axes[0, 0].bar(['Control', 'Treatment'],
               data['treatment'].value_counts().sort_index(),
               color=['skyblue', 'salmon'])
axes[0, 0].set_ylabel('Sample Size')
axes[0, 0].set_title('Randomization Results')

# Plot 2: Outcome distribution comparison
axes[0, 1].hist(data[data['treatment'] == 0]['Y_obs'],
                bins=30, alpha=0.5, label='Control', color='skyblue')
axes[0, 1].hist(data[data['treatment'] == 1]['Y_obs'],
                bins=30, alpha=0.5, label='Treatment', color='salmon')
axes[0, 1].axvline(data[data['treatment'] == 0]['Y_obs'].mean(),
                   color='blue', linestyle='--', linewidth=2)
axes[0, 1].axvline(data[data['treatment'] == 1]['Y_obs'].mean(),
                   color='red', linestyle='--', linewidth=2)
axes[0, 1].set_xlabel('Outcome Variable')
axes[0, 1].set_ylabel('Frequency')
axes[0, 1].set_title('Outcome Distribution Comparison')
axes[0, 1].legend()

# Plot 3: Covariate balance (age)
axes[1, 0].boxplot([data[data['treatment'] == 0]['age'],
                    data[data['treatment'] == 1]['age']],
                   labels=['Control', 'Treatment'])
axes[1, 0].set_ylabel('Age')
axes[1, 0].set_title('Covariate Balance: Age')

# Plot 4: ATE estimate comparison
methods = ['Simple\nDiff', 'Regression\n(no controls)', 'Regression\n(with controls)']
estimates = [ATE_simple, model1.params['treatment'], model2.params['treatment']]
se = [0, model1.bse['treatment'], model2.bse['treatment']]

axes[1, 1].errorbar(methods, estimates, yerr=[1.96*s for s in se],
                    fmt='o', capsize=5, capthick=2, markersize=8)
axes[1, 1].axhline(1000, color='red', linestyle='--', label='True ATE')
axes[1, 1].set_ylabel('ATE Estimate')
axes[1, 1].set_title('ATE Estimates by Method')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('rct_analysis.png', dpi=300, bbox_inches='tight')
plt.show()

print("\n✓ RCT analysis complete!")

Summary

Core Points of RCT

PointContent
Core MechanismRandomization → Independence → Balance → Unbiased estimation
Design TypesSimple/Stratified/Matched/Cluster randomization
Balance Checkst-test, chi-square test, F-test
Estimation MethodsSimple difference, regression, regression + controls
AdvantagesStrongest internal validity
LimitationsHigh cost, ethical issues, external validity

Key Insights

  1. Randomization is the Ultimate Weapon of Causal Inference

    • Eliminates all (observable and unobservable) confounders
    • Simple comparison yields unbiased estimates
  2. Balance Checks are Quality Assurance

    • Not hypothesis testing (want non-significance)
    • Must add controls if imbalanced
  3. RCT is Not a Panacea

    • Many important questions cannot use RCT (ethics, feasibility)
    • Need to combine with quasi-experimental methods

Practice Questions

  1. Design question: You want to study "the causal effect of remote work on employee productivity."

    • (a) Design an RCT
    • (b) Which type of randomization should you use? Why?
    • (c) What implementation challenges might you face?
  2. Analysis question: An RCT has 1000 people. After random assignment:

    • Treatment group average age 35, control group 30 (p = 0.02)
    • All other variables balanced (p > 0.1)

    Questions:

    • (a) Is this RCT valid?
    • (b) How should you analyze the data?
  3. Understanding question: Why does adding control variables in RCT not change unbiasedness of ATE estimate, but improve precision?

Click for answer hints

Question 1:

  • (b) Should use cluster randomization (by department), because spillover effects exist between employees in same department
  • (c) Challenges: Hawthorne effect, difficulty measuring productivity, attrition (employee turnover)

Question 2:

  • (a) RCT still valid, but age variable imbalanced
  • (b) Must control for age in regression

Question 3:

  • Control variables absorb part of outcome variable variance
  • Reduces residual standard deviation → reduces standard error → increases statistical power

Next Steps

In the next section, we'll deeply study Average Treatment Effects (ATE, ATT, LATE) and how to handle non-compliance in RCT.

Preview of Core Questions:

  • What's the difference between ATE, ATT, and LATE?
  • What is intention-to-treat (ITT) analysis?
  • How to use instrumental variables to handle non-compliance?

Keep going! 🚀


References:

  • Duflo, E., Glennerster, R., & Kremer, M. (2007). "Using randomization in development economics research: A toolkit". Handbook of Development Economics.
  • Angrist, J. D., & Pischke, J. S. (2008). Mostly Harmless Econometrics. Chapter 2.
  • Athey, S., & Imbens, G. W. (2017). "The econometrics of randomized experiments". Handbook of Field Experiments.

Released under the MIT License. Content © Author.