Chapter 7 Introduction (Time Series Analysis and Event Study)
Understanding socioeconomic phenomena through the time dimension: Trends, cycles, and causality
Chapter Objectives
Upon completing this chapter, you will be able to:
- Understand the basic characteristics of time series data (trend, seasonality, cycle)
- Master time series decomposition methods and applications
- Use ARIMA models for forecasting
- Conduct stationarity tests (ADF, KPSS)
- Implement event study methodology
- Evaluate policy effects and shock impacts
- Use Python toolkits (statsmodels, pandas)
Why is Time Series Analysis So Important?
The Time Dimension in Social Sciences
Time series data is everywhere:
- Macroeconomics: GDP, inflation, unemployment rate, interest rates
- Financial Economics: Stock prices, exchange rates, futures prices
- Labor Economics: Employment numbers, wage levels, labor participation rates
- Public Policy: Pre-post policy comparison, policy effect evaluation
- Sociology: Crime rates, birth rates, education enrollment rates
Time Series vs Cross-sectional Data
| Characteristic | Cross-sectional Data | Time Series Data |
|---|---|---|
| Observation Objects | Multiple individuals, single time point | Single/multiple individuals, multiple time points |
| Independence Assumption | Usually satisfied (i.i.d.) | Violated (serial correlation) |
| Typical Issues | Individual differences, omitted variables | Trend, seasonality, autocorrelation |
| Analysis Methods | OLS, cross-sectional regression | ARIMA, VAR, cointegration |
| Causal Inference | RCT, IV, RDD | DID, event study, breakpoint |
Classic Case: Box & Jenkins (1970)
Airline Passenger Data: Monthly international airline passenger numbers from 1949-1960
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.datasets import co2
# Load classic dataset
data = co2.load_pandas().data
# Visualization
fig, axes = plt.subplots(2, 1, figsize=(14, 8))
# Original series
axes[0].plot(data.index, data.values, linewidth=1.5)
axes[0].set_title('Atmospheric CO₂ Concentration (1958-2001)', fontsize=14, fontweight='bold')
axes[0].set_ylabel('CO₂ (ppm)')
axes[0].grid(True, alpha=0.3)
# Annual average
yearly = data.resample('Y').mean()
axes[1].plot(yearly.index, yearly.values, linewidth=2, marker='o')
axes[1].set_title('Annual Average CO₂ Concentration', fontsize=14, fontweight='bold')
axes[1].set_ylabel('CO₂ (ppm)')
axes[1].set_xlabel('Year')
axes[1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()Observed Characteristics:
- Trend: Long-term upward trend
- Seasonality: Annual cyclical fluctuations
- Random Variation: Unpredictable short-term fluctuations
Core Concepts of Time Series
1. Stationarity
Definition: Statistical properties of a time series do not change over time
Strict Stationarity:
Weak Stationarity / Covariance Stationarity:
- (constant mean)
- (constant variance)
- (autocovariance depends only on lag)
Why is Stationarity Important?
- Non-stationary series may lead to spurious regression
- Many time series models (like ARIMA) require stationary series
- Forecasting depends on stable statistical properties
Illustration:
np.random.seed(42)
n = 200
# Stationary series: white noise
stationary = np.random.normal(0, 1, n)
# Non-stationary series: random walk
random_walk = np.cumsum(np.random.normal(0, 1, n))
# Non-stationary series: deterministic trend
trend = np.arange(n) * 0.1 + np.random.normal(0, 1, n)
fig, axes = plt.subplots(3, 1, figsize=(14, 10))
axes[0].plot(stationary)
axes[0].set_title('Stationary Series: White Noise', fontsize=14, fontweight='bold')
axes[0].axhline(y=0, color='r', linestyle='--', alpha=0.5)
axes[0].set_ylabel('Value')
axes[0].grid(True, alpha=0.3)
axes[1].plot(random_walk, color='orange')
axes[1].set_title('Non-stationary Series: Random Walk (Unit Root)', fontsize=14, fontweight='bold')
axes[1].set_ylabel('Value')
axes[1].grid(True, alpha=0.3)
axes[2].plot(trend, color='green')
axes[2].set_title('Non-stationary Series: Deterministic Trend', fontsize=14, fontweight='bold')
axes[2].set_xlabel('Time')
axes[2].set_ylabel('Value')
axes[2].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()2. Autocorrelation
Definition: Correlation between a time series and its lagged values
Autocorrelation Function (ACF):
Partial Autocorrelation Function (PACF):
- Correlation between and after controlling for intermediate lags
- Used to identify AR order
Example:
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
# Generate AR(1) process
np.random.seed(42)
phi = 0.8
ar1 = [0]
for i in range(1, 200):
ar1.append(phi * ar1[-1] + np.random.normal(0, 1))
fig, axes = plt.subplots(1, 3, figsize=(16, 5))
# Time series plot
axes[0].plot(ar1)
axes[0].set_title('AR(1) Series: $y_t = 0.8 y_{t-1} + \epsilon_t$', fontsize=14, fontweight='bold')
axes[0].grid(True, alpha=0.3)
# ACF
plot_acf(ar1, lags=20, ax=axes[1], alpha=0.05)
axes[1].set_title('Autocorrelation Function (ACF)', fontsize=14, fontweight='bold')
# PACF
plot_pacf(ar1, lags=20, ax=axes[2], alpha=0.05)
axes[2].set_title('Partial Autocorrelation Function (PACF)', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()Interpretation:
- ACF: Exponential decay (AR process characteristic)
- PACF: Lag 1 significant, others insignificant (identified as AR(1))
3. Time Series Model Family
Time Series Models
├── Classical Decomposition Models
│ ├── Additive Model: Y = T + S + R
│ └── Multiplicative Model: Y = T × S × R
│
├── Exponential Smoothing
│ ├── Simple Exponential Smoothing (SES)
│ ├── Holt Linear Trend
│ └── Holt-Winters Seasonal
│
├── ARIMA Family
│ ├── AR (Autoregressive)
│ ├── MA (Moving Average)
│ ├── ARMA
│ ├── ARIMA (with differencing)
│ └── SARIMA (seasonal)
│
├── Vector Models
│ ├── VAR (Vector Autoregression)
│ ├── VECM (Vector Error Correction Model)
│ └── Structural VAR (SVAR)
│
└── Advanced Models
├── GARCH (Volatility Models)
├── State Space Models
└── Prophet (Facebook)Event Study Methodology
Definition and Applications
Event Study: Evaluating the impact of a specific event on a variable
Classic Application Scenarios:
Financial Markets:
- Impact of mergers and acquisitions on stock prices
- Impact of earnings announcements on stock returns
- Impact of regulatory policies on market volatility
Public Policy:
- Impact of minimum wage laws on employment
- Impact of environmental regulations on pollution
- Impact of education reforms on test scores
Natural Disasters:
- Impact of earthquakes on economic activity
- Impact of epidemics on consumer behavior
Basic Framework of Event Study
Timeline
├── Estimation Window
│ └── Establish "normal" benchmark
│
├── Event Window
│ ├── Pre-event
│ ├── Event Day
│ └── Post-event
│
└── Post-event Window
└── Long-term effect evaluationCore Metrics:
- Abnormal Return (AR):
- Cumulative Abnormal Return (CAR):
Illustration:
# Simulate event study
np.random.seed(123)
n = 200
event_day = 100
# Normal period returns
normal_returns = np.random.normal(0.001, 0.02, n)
# Add abnormal returns from event day onwards
abnormal_effect = 0.05
normal_returns[event_day:] += abnormal_effect
# Calculate cumulative returns
cumulative_returns = np.cumsum(normal_returns)
fig, axes = plt.subplots(2, 1, figsize=(14, 8))
# Daily returns
axes[0].plot(normal_returns, alpha=0.7)
axes[0].axvline(x=event_day, color='r', linestyle='--', linewidth=2, label='Event Day')
axes[0].axhline(y=0, color='black', linestyle='-', alpha=0.3)
axes[0].set_title('Daily Returns', fontsize=14, fontweight='bold')
axes[0].set_ylabel('Returns')
axes[0].legend()
axes[0].grid(True, alpha=0.3)
# Cumulative abnormal returns
axes[1].plot(cumulative_returns, linewidth=2)
axes[1].axvline(x=event_day, color='r', linestyle='--', linewidth=2, label='Event Day')
axes[1].set_title('Cumulative Returns', fontsize=14, fontweight='bold')
axes[1].set_xlabel('Time (Days)')
axes[1].set_ylabel('Cumulative Returns')
axes[1].legend()
axes[1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()Chapter Structure
Section 1: Time Series Basics
- Characteristics of time series data
- Time series processing in Python (pandas)
- Data stationarity testing (ADF, KPSS, PP)
- Differencing and transformation
- Case: Stationarity analysis of China's GDP growth rate
Section 2: Time Series Decomposition
- Classical decomposition methods (additive/multiplicative models)
- STL decomposition (Seasonal-Trend Decomposition using Loess)
- Seasonal adjustment (X-13ARIMA-SEATS)
- Trend extraction and filtering (HP filter, BK filter)
- Case: Seasonal adjustment of Consumer Price Index (CPI)
Section 3: ARIMA Models
- AR, MA, ARMA models
- Unit root and differencing
- ARIMA model identification, estimation, diagnosis
- Model selection (AIC, BIC)
- Forecasting and forecast intervals
- SARIMA (Seasonal ARIMA)
- Case: Unemployment rate forecasting
Section 4: Event Study Methodology
- Basic framework of event studies
- Normal return models (market model, mean-adjusted model)
- Calculation of abnormal returns
- Significance testing (t-test, rank test)
- Multiple event studies
- Case: Impact of merger announcements on stock prices
Section 5: Summary and Review
- Knowledge system summary
- 10 high-difficulty programming exercises
- Classic literature recommendations
- Learning path guide
Python Toolkits
Core Libraries
| Library | Main Functions | Installation |
|---|---|---|
| pandas | Time series data processing | pip install pandas |
| statsmodels | ARIMA, VAR, cointegration | pip install statsmodels |
| scipy | Statistical testing | pip install scipy |
| matplotlib | Visualization | pip install matplotlib |
| seaborn | Advanced visualization | pip install seaborn |
Specialized Libraries
| Library | Main Functions | Installation |
|---|---|---|
| arch | GARCH models | pip install arch |
| pmdarima | Auto ARIMA | pip install pmdarima |
| prophet | Facebook forecasting tool | pip install prophet |
| ruptures | Breakpoint detection | pip install ruptures |
Basic Setup
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
from statsmodels.tsa.stattools import adfuller, kpss, acf, pacf
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
# Chinese font settings
plt.rcParams['font.sans-serif'] = ['Arial Unicode MS'] # macOS
# plt.rcParams['font.sans-serif'] = ['SimHei'] # Windows
plt.rcParams['axes.unicode_minus'] = False
# Set style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['figure.dpi'] = 100
# pandas display settings
pd.set_option('display.max_rows', 100)
pd.set_option('display.max_columns', 20)
pd.set_option('display.width', 1000)Classic Literature
Must-Read Papers
Box, G. E. P., & Jenkins, G. M. (1970). Time Series Analysis: Forecasting and Control
- Foundational work on time series analysis
- Systematic introduction to ARIMA models
Dickey, D. A., & Fuller, W. A. (1979). "Distribution of the Estimators for Autoregressive Time Series with a Unit Root"
- ADF unit root test
- Standard method for stationarity testing
Fama, E. F., Fisher, L., Jensen, M. C., & Roll, R. (1969). "The Adjustment of Stock Prices to New Information"
- Seminal work on event study methodology
- Empirical support for efficient market hypothesis
Engle, R. F., & Granger, C. W. J. (1987). "Co-integration and Error Correction"
- Cointegration theory (Nobel Prize in Economics)
- Long-run equilibrium relationships
Applied Literature
Card, D., & Krueger, A. B. (1994). "Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania"
- Difference-in-differences (DID) method
- Natural experiment design
Bertrand, M., Duflo, E., & Mullainathan, S. (2004). "How Much Should We Trust Differences-In-Differences Estimates?"
- Serial correlation issues in DID
- Clustered standard errors
Learning Path Recommendations
Beginner (1-2 weeks)
Goal: Master basic time series data processing
Learning Content:
- Section 1: Stationarity concepts, ADF test
- First half of Section 2: Time series decomposition
- Practice: Using real data (GDP, CPI)
Recommended Resources:
- Wooldridge (2020): Chapter 10 "Basic Regression Analysis with Time Series Data"
Intermediate Learning (3-4 weeks)
Goal: Independently build ARIMA models
Learning Content:
- Second half of Section 2: STL decomposition
- Section 3: Complete ARIMA modeling process
- Practice: Forecasting unemployment rate, inflation rate
Recommended Resources:
- Stock & Watson (2020): Chapter 14 "Introduction to Time Series Regression"
Advanced Application (5-6 weeks)
Goal: Implement event studies and VAR analysis
Learning Content:
- Section 4: Event study methodology
- Practice: Policy effect evaluation
Recommended Resources:
- Hamilton (1994): Time Series Analysis (Classic textbook)
- Lütkepohl (2005): New Introduction to Multiple Time Series Analysis
Practical Recommendations
Common Pitfalls in Time Series Analysis
Spurious Regression
- Problem: Two non-stationary series may show high correlation but are actually unrelated
- Solution: Test for stationarity, use differencing or cointegration
Over-differencing
- Problem: Unnecessary differencing introduces MA components
- Solution: Use joint ADF and KPSS testing
Ignoring Structural Breaks
- Problem: Policy changes, crises lead to parameter changes
- Solution: Chow Test, Bai-Perron breakpoint testing
Small Sample Issues
- Problem: Time series typically have small sample sizes
- Solution: Interpret cautiously, use robust standard errors
Data Quality Checklist
- [ ] Check for missing values (interpolation vs deletion)
- [ ] Identify outliers (extreme events vs measurement errors)
- [ ] Confirm time unit and frequency (daily, monthly, quarterly)
- [ ] Check for seasonality (holiday effects)
- [ ] Plot time series (trend, breakpoints)
- [ ] Calculate basic statistics (mean, variance, skewness, kurtosis)
Get Started
Ready to explore the mysteries of the time dimension?
Let's begin with Section 1: Time Series Basics!
Remember:
"The best way to predict the future is to understand the past."
Time Series Analysis: Understanding the past, predicting the future, evaluating causality!