2.3 Randomized Controlled Trials (RCTs)
"Randomization is the only method known to us that can be relied on to control for all relevant factors."— Abhijit Banerjee & Esther Duflo, 2019 Nobel Laureates in Economics
The Gold Standard of Causal Inference
Section Objectives
- Understand how randomization eliminates selection bias
- Master RCT design principles and types
- Learn balance check methods
- Understand advantages and limitations of RCT
The Magic of Randomization
Core Idea
RCT (Randomized Controlled Trial) = Randomized Controlled Trial
Key Operation: Decide who receives treatment through coin flip (random number generator)
Usually (treatment and control groups each 50%)
Why Does Randomization Work?
Law of Large Numbers + Independence → Balance
When sample size is large enough, random assignment ensures:
Meaning:
- Both groups are on average the same across all characteristics
- Including observable characteristics (age, education) and unobservable characteristics (ability, motivation)
- The only difference between groups is whether they receive treatment
Simple Comparison is Unbiased Under RCT
Mathematical Proof
Under RCT, simple comparison:
Key Steps:
- Line 2: Observation rule ()
- Line 3: Randomization ensures
- Line 4: Definition of ATE
Contrast: Without Randomization
Selection bias reappears:
RCT Advantage: Second term (selection bias) = 0
RCT Design Types
1. Simple Randomization
Method: Each individual independently randomly assigned
import numpy as np
n = 1000
treatment = np.random.binomial(1, 0.5, n)Advantages:
- Simplest
- Theoretically completely eliminates bias
Disadvantages:
- May be imbalanced with small samples (e.g., 480 treatment, 520 control)
- May be imbalanced on covariates
2. Stratified Randomization
Method: First stratify by covariates, then randomize within each stratum
Case: Education experiment
Stratum 1 (Elite high schools): 500 people → 250 treatment, 250 control
Stratum 2 (Regular high schools): 300 people → 150 treatment, 150 control
Stratum 3 (Vocational schools): 200 people → 100 treatment, 100 controlPython Implementation:
import pandas as pd
import numpy as np
# Generate data
np.random.seed(42)
data = pd.DataFrame({
'student_id': range(1000),
'school_type': np.random.choice(['Elite', 'Regular', 'Vocational'], 1000, p=[0.5, 0.3, 0.2])
})
# Stratified randomization
def stratified_randomization(df, strata_col, prob=0.5):
df['treatment'] = 0
for stratum in df[strata_col].unique():
mask = df[strata_col] == stratum
n_stratum = mask.sum()
df.loc[mask, 'treatment'] = np.random.binomial(1, prob, n_stratum)
return df
data = stratified_randomization(data, 'school_type')
# Check balance
print(data.groupby('school_type')['treatment'].value_counts())Advantages:
- Ensures balance on important covariates
- Increases statistical power
When to Use:
- Known variables have large effects on outcomes (e.g., gender, age, baseline scores)
- Want to ensure sufficient subsample sizes
3. Matched-Pair Randomization
Method: First match similar individuals, then randomize within each pair
Case: School experiment
- Match 50 similar schools (by region, size, performance)
- Within each pair, randomly assign one school to treatment, the other to control
Python Implementation:
# Assume 100 schools, paired two by two
n_pairs = 50
pairs = np.repeat(range(n_pairs), 2) # [0,0,1,1,2,2,...]
# Randomize within each pair
treatment = np.zeros(100, dtype=int)
for pair_id in range(n_pairs):
pair_indices = np.where(pairs == pair_id)[0]
treatment[pair_indices[0]] = np.random.binomial(1, 0.5)
treatment[pair_indices[1]] = 1 - treatment[pair_indices[0]]
print(f"Treatment group size: {treatment.sum()}") # Should be exactly 50Advantages:
- Automatically balances covariates
- Exactly equal sample sizes
Disadvantages:
- Need to find suitable matches first
- Analysis must account for paired structure (Cluster SE)
4. Cluster Randomization
Method: Randomize at the group level (e.g., classrooms, schools, villages)
Case: Education policy evaluation
- Randomly select 50 schools to implement new policy
- Another 50 schools as control group
- Unit: Schools (not students)
Why Needed?
- Spillover effects: Students in same class influence each other
- Implementation feasibility: Cannot treat students differently within same school
Python Implementation:
# 100 schools
n_schools = 100
school_treatment = np.random.binomial(1, 0.5, n_schools)
# 30 students per school
students_per_school = 30
data = pd.DataFrame({
'school_id': np.repeat(range(n_schools), students_per_school),
'student_id': range(n_schools * students_per_school)
})
# Students inherit school treatment status
data['treatment'] = data['school_id'].map(
dict(zip(range(n_schools), school_treatment))
)
print(data.groupby('school_id')['treatment'].first().value_counts())Important Notes:
- Standard errors need cluster adjustment (Cluster-Robust SE)
- Effective sample size = number of clusters (not individuals)
Balance Checks
Why Needed?
Although randomization theoretically ensures balance, actual data may have imbalances due to:
- Limited sample size
- Random fluctuation
- Implementation biases
Check Methods
1. Descriptive Statistics Comparison
import pandas as pd
from scipy import stats
# Simulate data
np.random.seed(42)
n = 200
data = pd.DataFrame({
'treatment': np.random.binomial(1, 0.5, n),
'age': np.random.normal(30, 5, n),
'income': np.random.normal(50000, 15000, n),
'education': np.random.randint(12, 22, n)
})
# Balance table
balance_table = data.groupby('treatment').agg({
'age': ['mean', 'std'],
'income': ['mean', 'std'],
'education': ['mean', 'std']
}).T
balance_table.columns = ['Control', 'Treatment']
print(balance_table.round(2))Output:
Control Treatment
age mean 29.87 30.13
std 5.12 4.88
income mean 49234.56 50876.23
std 14987.45 15234.78
education mean 16.47 16.53
std 2.89 2.912. t-Test (Continuous Variables)
def balance_test_continuous(data, var, treatment_col='treatment'):
"""
Balance t-test for continuous variables
"""
treated = data[data[treatment_col] == 1][var]
control = data[data[treatment_col] == 0][var]
t_stat, p_value = stats.ttest_ind(treated, control)
return {
'Variable': var,
'Treatment Mean': treated.mean(),
'Control Mean': control.mean(),
'Difference': treated.mean() - control.mean(),
't-statistic': t_stat,
'p-value': p_value,
'Balanced': '✓' if p_value > 0.05 else '✗'
}
# Batch test
balance_results = []
for var in ['age', 'income', 'education']:
balance_results.append(balance_test_continuous(data, var))
balance_df = pd.DataFrame(balance_results)
print(balance_df.round(3))3. Chi-Square Test (Categorical Variables)
# Add categorical variables
data['gender'] = np.random.choice(['Male', 'Female'], n)
data['region'] = np.random.choice(['East', 'Central', 'West'], n)
def balance_test_categorical(data, var, treatment_col='treatment'):
"""
Balance chi-square test for categorical variables
"""
contingency_table = pd.crosstab(data[var], data[treatment_col])
chi2, p_value, dof, expected = stats.chi2_contingency(contingency_table)
return {
'Variable': var,
'Chi-square': chi2,
'p-value': p_value,
'Balanced': '✓' if p_value > 0.05 else '✗'
}
# Batch test
cat_results = []
for var in ['gender', 'region']:
cat_results.append(balance_test_categorical(data, var))
cat_df = pd.DataFrame(cat_results)
print(cat_df.round(3))4. Joint F-Test
import statsmodels.api as sm
# Regression: treatment ~ age + income + education + ...
X = data[['age', 'income', 'education']]
X = sm.add_constant(X)
y = data['treatment']
model = sm.OLS(y, X).fit()
print(model.summary())
# F-test: H0: all coefficients = 0
print(f"\nF-statistic: {model.fvalue:.3f}")
print(f"Prob (F-statistic): {model.f_pvalue:.4f}")
if model.f_pvalue > 0.05:
print("✓ Covariates jointly balanced (F-test not significant)")
else:
print("✗ Covariates may be imbalanced (F-test significant)")Interpreting Balance Checks
| p-value | Conclusion | Action |
|---|---|---|
| > 0.1 | ✓ Very balanced | No adjustment needed |
| 0.05-0.1 | ⚠️ Borderline | Can add control variables |
| < 0.05 | ✗ Imbalanced | Must add control variables or re-randomize |
Important Reminder:
- Balance check ≠ Hypothesis test
- We should not reject null hypothesis (we want p > 0.05)
- Multiple testing problem: Testing 20 variables, expect 1 with p < 0.05
RCT Estimation Methods
Method 1: Simple Difference
# Simplest ATE estimate
ATE_simple = data[data['treatment'] == 1]['outcome'].mean() - \
data[data['treatment'] == 0]['outcome'].mean()
# Standard error (assuming homoscedasticity)
n1 = (data['treatment'] == 1).sum()
n0 = (data['treatment'] == 0).sum()
s1 = data[data['treatment'] == 1]['outcome'].std()
s0 = data[data['treatment'] == 0]['outcome'].std()
se_simple = np.sqrt(s1**2 / n1 + s0**2 / n0)
# 95% confidence interval
ci_lower = ATE_simple - 1.96 * se_simple
ci_upper = ATE_simple + 1.96 * se_simple
print(f"ATE: {ATE_simple:.2f}")
print(f"95% CI: [{ci_lower:.2f}, {ci_upper:.2f}]")Method 2: Regression Estimation
import statsmodels.api as sm
X = sm.add_constant(data['treatment'])
y = data['outcome']
model = sm.OLS(y, X).fit()
print(model.summary())
# ATE = coefficient on treatment variable
ATE_reg = model.params['treatment']
se_reg = model.bse['treatment']
print(f"\nATE (regression): {ATE_reg:.2f}")
print(f"Standard error: {se_reg:.2f}")Advantages:
- Can add control variables (improve precision)
- Heteroskedasticity-robust standard errors
Method 3: Regression + Control Variables
# Add control variables
X = data[['treatment', 'age', 'income', 'education']]
X = sm.add_constant(X)
y = data['outcome']
model_control = sm.OLS(y, X).fit(cov_type='HC3') # Heteroskedasticity-robust SE
print(model_control.summary())
# Compare precision
print(f"\nWithout controls SE: {se_reg:.3f}")
print(f"With controls SE: {model_control.bse['treatment']:.3f}")
print(f"Precision improvement: {(1 - model_control.bse['treatment'] / se_reg) * 100:.1f}%")Why Add Control Variables in RCT?
- Under RCT, estimate is unbiased (with or without controls)
- But adding controls can reduce standard errors (increase statistical power)
Advantages and Limitations of RCT
Advantages (Why the Gold Standard)
| Advantage | Explanation |
|---|---|
| Strongest Internal Validity | Randomization eliminates all confounders |
| No Assumptions Needed | No functional form or unobservable assumptions |
| Simple and Transparent | Intuitive analysis, credible results |
| Heterogeneity Analysis | Easy to study subgroup effects |
Limitations
| Limitation | Explanation | Response |
|---|---|---|
| High Cost | Time, money, personnel | Consider quasi-experimental methods |
| Ethical Issues | Some treatments cannot be randomized (e.g., smoking) | Observational data + causal inference methods |
| External Validity | Experimental sample may not represent population | Multi-site replication |
| Hawthorne Effect | Participants change behavior when observed | Double-blind design |
| Attrition | Tracking difficulties lead to missing data | Intention-to-treat (ITT) analysis |
| Non-compliance | Assigned to treatment but didn't receive it | Instrumental variables (IV) |
Classic RCT Cases
Case 1: PROGRESA (Mexico Conditional Cash Transfer)
Background:
- 1997 Mexican government anti-poverty program
- Cash transfers to poor families conditional on children attending school and regular health checkups
Experimental Design:
- Randomly selected 320 villages for immediate implementation
- 186 villages delayed 2 years (control group)
- Unit: Villages (cluster randomization)
Results:
- Child enrollment increased by 3.4 percentage points
- Significant health improvements
- Later scaled nationwide (renamed Oportunidades)
Key Lessons:
- Policy rollout can use phase-in design for RCT
- Control group only delayed not permanently deprived
Case 2: STAR Project (Tennessee Class Size)
Background:
- 1985-1989 study of class size effects on academic achievement
Experimental Design:
- 11,600 students randomly assigned to:
- Small class (13-17 students)
- Regular class (22-25 students)
- Regular class + teaching aide
- Unit: Students (within-school randomization)
Results:
- Small class students had significantly higher achievement
- Effects persisted to high school and adulthood
- Larger effects for disadvantaged groups
Data: Publicly available dataset, widely used for teaching
# Load STAR data (example)
# data = pd.read_csv('STAR_data.csv')
# ATE = data[data['small_class'] == 1]['test_score'].mean() - \
# data[data['small_class'] == 0]['test_score'].mean()Case 3: Oregon Health Insurance Experiment
Background:
- 2008 Oregon expanded Medicaid (healthcare assistance)
- Limited slots allocated by lottery
Experimental Design:
- Naturally formed RCT
- Lottery winners could apply for Medicaid (treatment group)
- Lottery losers as control group
Results:
- Health insurance significantly reduced financial stress
- Improved mental health
- But effects on physical health indicators not significant (controversial)
Complete Python Practice: Simulating RCT
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import statsmodels.api as sm
# Set random seed
np.random.seed(42)
# ==== 1. Data Generation ====
n = 500
# Covariates
data = pd.DataFrame({
'age': np.random.normal(30, 10, n),
'income': np.random.lognormal(10.5, 0.5, n),
'education': np.random.randint(9, 23, n)
})
# Random assignment (RCT)
data['treatment'] = np.random.binomial(1, 0.5, n)
# Outcome variable (simulate true causal effect = 1000)
data['Y0'] = (500 + 50 * data['age'] + 0.05 * data['income'] +
100 * data['education'] + np.random.normal(0, 500, n))
data['Y1'] = data['Y0'] + 1000 + np.random.normal(0, 200, n)
data['Y_obs'] = np.where(data['treatment'] == 1, data['Y1'], data['Y0'])
# ==== 2. Balance Checks ====
print("=" * 50)
print("Balance Checks")
print("=" * 50)
for var in ['age', 'income', 'education']:
treated = data[data['treatment'] == 1][var]
control = data[data['treatment'] == 0][var]
t_stat, p_val = stats.ttest_ind(treated, control)
print(f"\n{var}:")
print(f" Treatment mean: {treated.mean():.2f}")
print(f" Control mean: {control.mean():.2f}")
print(f" Difference: {treated.mean() - control.mean():.2f}")
print(f" p-value: {p_val:.4f} {'✓' if p_val > 0.05 else '✗'}")
# ==== 3. ATE Estimation ====
print("\n" + "=" * 50)
print("ATE Estimation")
print("=" * 50)
# Method 1: Simple difference
ATE_simple = (data[data['treatment'] == 1]['Y_obs'].mean() -
data[data['treatment'] == 0]['Y_obs'].mean())
print(f"\nSimple difference: {ATE_simple:.2f}")
# Method 2: Regression (no controls)
X = sm.add_constant(data['treatment'])
model1 = sm.OLS(data['Y_obs'], X).fit(cov_type='HC3')
print(f"\nRegression (no controls):")
print(f" ATE: {model1.params['treatment']:.2f}")
print(f" SE: {model1.bse['treatment']:.2f}")
print(f" 95% CI: [{model1.conf_int().loc['treatment', 0]:.2f}, "
f"{model1.conf_int().loc['treatment', 1]:.2f}]")
# Method 3: Regression (with controls)
X_control = sm.add_constant(data[['treatment', 'age', 'income', 'education']])
model2 = sm.OLS(data['Y_obs'], X_control).fit(cov_type='HC3')
print(f"\nRegression (with controls):")
print(f" ATE: {model2.params['treatment']:.2f}")
print(f" SE: {model2.bse['treatment']:.2f}")
print(f" 95% CI: [{model2.conf_int().loc['treatment', 0]:.2f}, "
f"{model2.conf_int().loc['treatment', 1]:.2f}]")
print(f"\nPrecision improvement: {(1 - model2.bse['treatment'] / model1.bse['treatment']) * 100:.1f}%")
# ==== 4. Visualization ====
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
# Plot 1: Treatment assignment
axes[0, 0].bar(['Control', 'Treatment'],
data['treatment'].value_counts().sort_index(),
color=['skyblue', 'salmon'])
axes[0, 0].set_ylabel('Sample Size')
axes[0, 0].set_title('Randomization Results')
# Plot 2: Outcome distribution comparison
axes[0, 1].hist(data[data['treatment'] == 0]['Y_obs'],
bins=30, alpha=0.5, label='Control', color='skyblue')
axes[0, 1].hist(data[data['treatment'] == 1]['Y_obs'],
bins=30, alpha=0.5, label='Treatment', color='salmon')
axes[0, 1].axvline(data[data['treatment'] == 0]['Y_obs'].mean(),
color='blue', linestyle='--', linewidth=2)
axes[0, 1].axvline(data[data['treatment'] == 1]['Y_obs'].mean(),
color='red', linestyle='--', linewidth=2)
axes[0, 1].set_xlabel('Outcome Variable')
axes[0, 1].set_ylabel('Frequency')
axes[0, 1].set_title('Outcome Distribution Comparison')
axes[0, 1].legend()
# Plot 3: Covariate balance (age)
axes[1, 0].boxplot([data[data['treatment'] == 0]['age'],
data[data['treatment'] == 1]['age']],
labels=['Control', 'Treatment'])
axes[1, 0].set_ylabel('Age')
axes[1, 0].set_title('Covariate Balance: Age')
# Plot 4: ATE estimate comparison
methods = ['Simple\nDiff', 'Regression\n(no controls)', 'Regression\n(with controls)']
estimates = [ATE_simple, model1.params['treatment'], model2.params['treatment']]
se = [0, model1.bse['treatment'], model2.bse['treatment']]
axes[1, 1].errorbar(methods, estimates, yerr=[1.96*s for s in se],
fmt='o', capsize=5, capthick=2, markersize=8)
axes[1, 1].axhline(1000, color='red', linestyle='--', label='True ATE')
axes[1, 1].set_ylabel('ATE Estimate')
axes[1, 1].set_title('ATE Estimates by Method')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('rct_analysis.png', dpi=300, bbox_inches='tight')
plt.show()
print("\n✓ RCT analysis complete!")Summary
Core Points of RCT
| Point | Content |
|---|---|
| Core Mechanism | Randomization → Independence → Balance → Unbiased estimation |
| Design Types | Simple/Stratified/Matched/Cluster randomization |
| Balance Checks | t-test, chi-square test, F-test |
| Estimation Methods | Simple difference, regression, regression + controls |
| Advantages | Strongest internal validity |
| Limitations | High cost, ethical issues, external validity |
Key Insights
Randomization is the Ultimate Weapon of Causal Inference
- Eliminates all (observable and unobservable) confounders
- Simple comparison yields unbiased estimates
Balance Checks are Quality Assurance
- Not hypothesis testing (want non-significance)
- Must add controls if imbalanced
RCT is Not a Panacea
- Many important questions cannot use RCT (ethics, feasibility)
- Need to combine with quasi-experimental methods
Practice Questions
Design question: You want to study "the causal effect of remote work on employee productivity."
- (a) Design an RCT
- (b) Which type of randomization should you use? Why?
- (c) What implementation challenges might you face?
Analysis question: An RCT has 1000 people. After random assignment:
- Treatment group average age 35, control group 30 (p = 0.02)
- All other variables balanced (p > 0.1)
Questions:
- (a) Is this RCT valid?
- (b) How should you analyze the data?
Understanding question: Why does adding control variables in RCT not change unbiasedness of ATE estimate, but improve precision?
Click for answer hints
Question 1:
- (b) Should use cluster randomization (by department), because spillover effects exist between employees in same department
- (c) Challenges: Hawthorne effect, difficulty measuring productivity, attrition (employee turnover)
Question 2:
- (a) RCT still valid, but age variable imbalanced
- (b) Must control for age in regression
Question 3:
- Control variables absorb part of outcome variable variance
- Reduces residual standard deviation → reduces standard error → increases statistical power
Next Steps
In the next section, we'll deeply study Average Treatment Effects (ATE, ATT, LATE) and how to handle non-compliance in RCT.
Preview of Core Questions:
- What's the difference between ATE, ATT, and LATE?
- What is intention-to-treat (ITT) analysis?
- How to use instrumental variables to handle non-compliance?
Keep going! 🚀
References:
- Duflo, E., Glennerster, R., & Kremer, M. (2007). "Using randomization in development economics research: A toolkit". Handbook of Development Economics.
- Angrist, J. D., & Pischke, J. S. (2008). Mostly Harmless Econometrics. Chapter 2.
- Athey, S., & Imbens, G. W. (2017). "The econometrics of randomized experiments". Handbook of Field Experiments.