9.4 Placebo Tests
A Critical Method for Validating DID Robustness
Learning Objectives
- Understand the rationale and role of placebo tests
- Master common placebo test designs and applicable scenarios
- Implement common placebo tests using Python
I. Why Placebo Tests Are Needed
In the DID framework, the core identifying assumption is "parallel trends." Even after conducting pre-trend tests, we still need "placebo tests" to further verify: if the policy were "fake" (not at the true time/not on the true group/using an unrelated outcome variable), the estimated "policy effect" should be close to 0 and insignificant. Otherwise, the model may be mistaking other factors for the policy effect.
Common motivations:
- Exclude coincidence (other events happening at the same time point)
- Exclude selectivity (spurious effects caused by differences between treatment and control groups)
- Exclude model specification issues (inappropriate controls, clustering methods, trend specifications, etc.)
II. Common Placebo Test Methods (Recommended Priority Order)
Placebo Time
- Approach: Shift the "policy implementation time" forward/backward by several periods, then re-estimate DID.
- Expectation: Coefficients should be close to 0 (insignificant).
- Applicable: When time dimension is long and there are no dense interventions.
Placebo Group
- Approach: Randomly designate unaffected units that are similar to the treatment group as "treatment group."
- Expectation: DID coefficient distribution centered at 0.
- Applicable: When there are many cross-sectional samples.
Leave-One-Out Robustness (Exclude single treatment unit/region)
- Approach: Each time exclude one treatment unit and re-estimate.
- Expectation: Results not driven by a single unit.
Randomization Inference (Permutation Test)
- Approach: Keep time structure, randomly shuffle treatment assignment, repeat B times, form "virtual effect" distribution.
- Expectation: True effect lies in the tail of the permutation distribution (obtaining approximate p-value).
Placebo Outcome Variable
- Approach: Select an outcome variable "theoretically unaffected by the policy" and run DID.
- Expectation: Estimated coefficient ≈ 0.
III. Minimal Python Implementation Templates
For quick adoption, below are minimal, replicable "placebo test" code templates (consistent with 9.3 style, UTF-8 encoding).
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
# df: contains columns y, treated(0/1), time(integer), post(0/1), id
def did_once(df):
"""Standard DID regression (simplest form of entity/time fixed effects, can be extended as needed)"""
model = smf.ols('y ~ treated*post + C(id) + C(time)', data=df).fit(cov_type='cluster', cov_kwds={'groups': df['id']})
return model.params.get('treated:post', np.nan)
def placebo_time(df, shift=2):
"""Shift policy time point by shift periods, reconstruct post and re-estimate DID"""
df2 = df.copy()
true_cut = df2.loc[df2['post']==1, 'time'].min() # True time point
fake_cut = true_cut + shift
df2['post'] = (df2['time'] >= fake_cut).astype(int)
return did_once(df2)
def placebo_group(df, n_draw=200, seed=42):
"""Randomly designate control units of same size as "treatment group", repeat n_draw times, return placebo effect distribution"""
rng = np.random.default_rng(seed)
ids = df['id'].unique()
treated_ids = df[df['treated']==1]['id'].unique()
k = len(treated_ids)
fake_ate = []
for _ in range(n_draw):
fake_treated = set(rng.choice(ids, size=k, replace=False))
df2 = df.copy()
df2['treated'] = df2['id'].isin(fake_treated).astype(int)
fake_ate.append(did_once(df2))
return pd.Series(fake_ate, name='placebo_group_ate')Usage recommendation: First run the true DID once, record ATE; then run placebo_time / placebo_group, plot distribution and mark where true ATE falls.
IV. Interpreting Results and Considerations
- Placebo Time: Across a series of "fake time points," coefficients should fluctuate around 0; if significant, beware of temporal confounding.
- Placebo Group: Placebo effect distribution should be centered at 0, true ATE should fall in the tail of distribution.
- Leave-One-Out: If results remain stable after excluding any unit, not driven by "one special unit."
- Permutation Test: Can report permutation p-value as robustness evidence.
- Placebo Outcome: Clearly explain why this outcome "should not be affected by the policy."
Technical details:
- Clustered standard errors: Panel data commonly cluster at entity level (or two-way clustering).
- Multi-period/staggered treatment: Recommend using event study (see 9.3) or recent multi-period DID estimators.
Section Summary
- Placebo tests are "counterfactual stress tests" for DID identifying assumptions.
- Common methods: fake time points, fake control groups, leave-one-out, permutation tests, placebo outcomes.
- Key expectation is "close to 0 and insignificant"; if significant, need to return to identification and specification level to find causes.
- When reporting, recommend combining: plots (distributions/event studies), tables (ATE and p-values), and textual explanations.
Previous: 9.3 Parallel Trends Assumption | Next: 9.5 Classic Cases and Python Implementation