Skip to content

9.5 Classic Cases and Python Implementation

From Real Research to Reproducible Code


Learning Objectives

  • Understand classic DID application scenarios
  • Replicate complete DID workflow using synthetic data
  • Master code templates transferable to real datasets

I. Overview of Classic Cases (Conceptual Summary)

  • Minimum Wage Policy: Using employment/wages as outcome variables, examining differences before and after policy implementation across affected and unaffected regions.
  • Environmental Regulation/Tax Policies: Examining impacts on production capacity, emissions, investment, etc.
  • Education/Healthcare Reforms: Examining impacts on student performance, health indicators, etc.

Reminder: In any case, always return to the core question "Is parallel trends assumption reasonable?" (see 9.3).


II. Synthetic Data Demonstration: Complete DID from Scratch (Runnable)

python
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf

# Parameters
n_units = 60      # Number of units
n_periods = 10    # Number of time periods
policy_time = 6   # Policy implementation period (starting from 1)

rng = np.random.default_rng(42)
ids = np.arange(n_units)
periods = np.arange(1, n_periods + 1)

data = []
for i in ids:
    treated = 1 if i >= n_units // 2 else 0
    unit_fe = rng.normal(0, 3)
    for t in periods:
        time_fe = 0.5 * t
        post = 1 if t >= policy_time else 0
        tau = 4.0  # True policy effect
        y = 20 + unit_fe + time_fe + tau * treated * post + rng.normal(0, 2)
        data.append({
            'id': i,
            'time': t,
            'treated': treated,
            'post': post,
            'y': y
        })

df = pd.DataFrame(data)

# DID regression (with entity/time fixed effects, clustered standard errors at entity level)
model = smf.ols('y ~ treated*post + C(id) + C(time)', data=df) \
            .fit(cov_type='cluster', cov_kwds={'groups': df['id']})
print(model.summary().tables[1])
print('\nATE (treated:post) =', model.params['treated:post'])

Extensions:

  • Event Study to check pre-trends and dynamic effects (see 9.3).
  • For multi-period/staggered treatment, recommend using recent multi-period DID estimators (e.g., Sun & Abraham, Callaway & Sant'Anna implementations).

III. Real Data Example (Operational Template)

  1. Organize data into long format: one row = one unit × one period (contains id, time, treated, post, y)
  2. Conduct 9.3's "pre-trends and event study" to assess identifying assumptions
  3. Estimate basic DID + robustness (clustering/two-way clustering, controlling for trends)
  4. Conduct 9.4's placebo tests: fake time points/fake control groups/leave-one-out/permutation tests/placebo outcomes
  5. Report: main results + pre-trends/event study plots + placebo tests + explanations

Section Summary

  • Classic DID cases are numerous, but identification logic always revolves around "parallel trends."
  • First use reproducible synthetic data to work through the complete workflow, then transfer to real data.
  • Template-based thinking: data structure, regression formula, clustering methods, visualization and robustness.

Appendix: Event Study Plot (Consistent with 9.3 Style)

Using the df data constructed above, estimate and plot event study dynamic effects.

python
import matplotlib.pyplot as plt
from linearmodels.panel import PanelOLS

# Construct relative time (policy implementation period as 0, pre-policy negative, post-policy positive)
df['rel_time'] = df['time'] - policy_time
df['rel_time_treated'] = df['rel_time'] * df['treated']

# Generate leads/lags (period -1 as baseline)
min_lead, max_lag = - (policy_time - 1), (n_periods - policy_time)
lead_lag_vars = []
for k in range(min_lead, max_lag + 1):
    if k == -1:
        continue  # Baseline period doesn't get dummy
    col = f'LL_{k}'
    df[col] = (df['rel_time_treated'] == k).astype(int)
    lead_lag_vars.append(col)

# Panel indexing
panel = df.set_index(['id', 'time'])

# Regression (entity and time fixed effects)
model_es = PanelOLS(
    dependent=panel['y'],
    exog=panel[lead_lag_vars],
    entity_effects=True,
    time_effects=True
).fit(cov_type='clustered', cluster_entity=True)

print(model_es.summary)

# Extract coefficients and confidence intervals, construct series including baseline period (-1)
rows = []
for k in range(min_lead, max_lag + 1):
    if k == -1:
        rows.append({'rel_time': k, 'coef': 0.0, 'low': 0.0, 'high': 0.0})
    else:
        name = f'LL_{k}'
        coef = float(model_es.params.get(name, 0.0))
        ci = model_es.conf_int().loc[name]
        rows.append({'rel_time': k, 'coef': coef, 'low': float(ci[0]), 'high': float(ci[1])})

es = pd.DataFrame(rows).sort_values('rel_time')

# Plotting
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(es['rel_time'], es['coef'], 'o-', color='navy', label='DID Coefficient')
ax.fill_between(es['rel_time'], es['low'], es['high'], color='navy', alpha=0.25, label='95% CI')
ax.axhline(0, color='black', linestyle='--', linewidth=1)
ax.axvline(0, color='red', linestyle='--', linewidth=1.5, label='Policy Implementation Period')
ax.set_xlabel('Relative Time (pre-policy negative, post-policy positive)')
ax.set_ylabel('Effect')
ax.set_title('Event Study: Dynamic Treatment Effects')
ax.legend()
ax.grid(alpha=0.3)
plt.tight_layout()
plt.show()

Previous: 9.4 Placebo Tests | Next: 9.6 Chapter Summary

Released under the MIT License. Content © Author.