Skip to content

Chapter 7 Introduction (Time Series Analysis and Event Study)

Understanding socioeconomic phenomena through the time dimension: Trends, cycles, and causality

DifficultyImportance


Chapter Objectives

Upon completing this chapter, you will be able to:

  • Understand the basic characteristics of time series data (trend, seasonality, cycle)
  • Master time series decomposition methods and applications
  • Use ARIMA models for forecasting
  • Conduct stationarity tests (ADF, KPSS)
  • Implement event study methodology
  • Evaluate policy effects and shock impacts
  • Use Python toolkits (statsmodels, pandas)

Why is Time Series Analysis So Important?

The Time Dimension in Social Sciences

Time series data is everywhere:

  • Macroeconomics: GDP, inflation, unemployment rate, interest rates
  • Financial Economics: Stock prices, exchange rates, futures prices
  • Labor Economics: Employment numbers, wage levels, labor participation rates
  • Public Policy: Pre-post policy comparison, policy effect evaluation
  • Sociology: Crime rates, birth rates, education enrollment rates

Time Series vs Cross-sectional Data

CharacteristicCross-sectional DataTime Series Data
Observation ObjectsMultiple individuals, single time pointSingle/multiple individuals, multiple time points
Independence AssumptionUsually satisfied (i.i.d.)Violated (serial correlation)
Typical IssuesIndividual differences, omitted variablesTrend, seasonality, autocorrelation
Analysis MethodsOLS, cross-sectional regressionARIMA, VAR, cointegration
Causal InferenceRCT, IV, RDDDID, event study, breakpoint

Classic Case: Box & Jenkins (1970)

Airline Passenger Data: Monthly international airline passenger numbers from 1949-1960

python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.datasets import co2

# Load classic dataset
data = co2.load_pandas().data

# Visualization
fig, axes = plt.subplots(2, 1, figsize=(14, 8))

# Original series
axes[0].plot(data.index, data.values, linewidth=1.5)
axes[0].set_title('Atmospheric CO₂ Concentration (1958-2001)', fontsize=14, fontweight='bold')
axes[0].set_ylabel('CO₂ (ppm)')
axes[0].grid(True, alpha=0.3)

# Annual average
yearly = data.resample('Y').mean()
axes[1].plot(yearly.index, yearly.values, linewidth=2, marker='o')
axes[1].set_title('Annual Average CO₂ Concentration', fontsize=14, fontweight='bold')
axes[1].set_ylabel('CO₂ (ppm)')
axes[1].set_xlabel('Year')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Observed Characteristics:

  1. Trend: Long-term upward trend
  2. Seasonality: Annual cyclical fluctuations
  3. Random Variation: Unpredictable short-term fluctuations

Core Concepts of Time Series

1. Stationarity

Definition: Statistical properties of a time series do not change over time

Strict Stationarity:

Weak Stationarity / Covariance Stationarity:

  1. (constant mean)
  2. (constant variance)
  3. (autocovariance depends only on lag)

Why is Stationarity Important?

  • Non-stationary series may lead to spurious regression
  • Many time series models (like ARIMA) require stationary series
  • Forecasting depends on stable statistical properties

Illustration:

python
np.random.seed(42)
n = 200

# Stationary series: white noise
stationary = np.random.normal(0, 1, n)

# Non-stationary series: random walk
random_walk = np.cumsum(np.random.normal(0, 1, n))

# Non-stationary series: deterministic trend
trend = np.arange(n) * 0.1 + np.random.normal(0, 1, n)

fig, axes = plt.subplots(3, 1, figsize=(14, 10))

axes[0].plot(stationary)
axes[0].set_title('Stationary Series: White Noise', fontsize=14, fontweight='bold')
axes[0].axhline(y=0, color='r', linestyle='--', alpha=0.5)
axes[0].set_ylabel('Value')
axes[0].grid(True, alpha=0.3)

axes[1].plot(random_walk, color='orange')
axes[1].set_title('Non-stationary Series: Random Walk (Unit Root)', fontsize=14, fontweight='bold')
axes[1].set_ylabel('Value')
axes[1].grid(True, alpha=0.3)

axes[2].plot(trend, color='green')
axes[2].set_title('Non-stationary Series: Deterministic Trend', fontsize=14, fontweight='bold')
axes[2].set_xlabel('Time')
axes[2].set_ylabel('Value')
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

2. Autocorrelation

Definition: Correlation between a time series and its lagged values

Autocorrelation Function (ACF):

Partial Autocorrelation Function (PACF):

  • Correlation between and after controlling for intermediate lags
  • Used to identify AR order

Example:

python
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

# Generate AR(1) process
np.random.seed(42)
phi = 0.8
ar1 = [0]
for i in range(1, 200):
    ar1.append(phi * ar1[-1] + np.random.normal(0, 1))

fig, axes = plt.subplots(1, 3, figsize=(16, 5))

# Time series plot
axes[0].plot(ar1)
axes[0].set_title('AR(1) Series: $y_t = 0.8 y_{t-1} + \epsilon_t$', fontsize=14, fontweight='bold')
axes[0].grid(True, alpha=0.3)

# ACF
plot_acf(ar1, lags=20, ax=axes[1], alpha=0.05)
axes[1].set_title('Autocorrelation Function (ACF)', fontsize=14, fontweight='bold')

# PACF
plot_pacf(ar1, lags=20, ax=axes[2], alpha=0.05)
axes[2].set_title('Partial Autocorrelation Function (PACF)', fontsize=14, fontweight='bold')

plt.tight_layout()
plt.show()

Interpretation:

  • ACF: Exponential decay (AR process characteristic)
  • PACF: Lag 1 significant, others insignificant (identified as AR(1))

3. Time Series Model Family

Time Series Models
├── Classical Decomposition Models
│   ├── Additive Model: Y = T + S + R
│   └── Multiplicative Model: Y = T × S × R

├── Exponential Smoothing
│   ├── Simple Exponential Smoothing (SES)
│   ├── Holt Linear Trend
│   └── Holt-Winters Seasonal

├── ARIMA Family
│   ├── AR (Autoregressive)
│   ├── MA (Moving Average)
│   ├── ARMA
│   ├── ARIMA (with differencing)
│   └── SARIMA (seasonal)

├── Vector Models
│   ├── VAR (Vector Autoregression)
│   ├── VECM (Vector Error Correction Model)
│   └── Structural VAR (SVAR)

└── Advanced Models
    ├── GARCH (Volatility Models)
    ├── State Space Models
    └── Prophet (Facebook)

Event Study Methodology

Definition and Applications

Event Study: Evaluating the impact of a specific event on a variable

Classic Application Scenarios:

  1. Financial Markets:

    • Impact of mergers and acquisitions on stock prices
    • Impact of earnings announcements on stock returns
    • Impact of regulatory policies on market volatility
  2. Public Policy:

    • Impact of minimum wage laws on employment
    • Impact of environmental regulations on pollution
    • Impact of education reforms on test scores
  3. Natural Disasters:

    • Impact of earthquakes on economic activity
    • Impact of epidemics on consumer behavior

Basic Framework of Event Study

Timeline
├── Estimation Window
│   └── Establish "normal" benchmark

├── Event Window
│   ├── Pre-event
│   ├── Event Day
│   └── Post-event

└── Post-event Window
    └── Long-term effect evaluation

Core Metrics:

  1. Abnormal Return (AR):
  1. Cumulative Abnormal Return (CAR):

Illustration:

python
# Simulate event study
np.random.seed(123)
n = 200
event_day = 100

# Normal period returns
normal_returns = np.random.normal(0.001, 0.02, n)

# Add abnormal returns from event day onwards
abnormal_effect = 0.05
normal_returns[event_day:] += abnormal_effect

# Calculate cumulative returns
cumulative_returns = np.cumsum(normal_returns)

fig, axes = plt.subplots(2, 1, figsize=(14, 8))

# Daily returns
axes[0].plot(normal_returns, alpha=0.7)
axes[0].axvline(x=event_day, color='r', linestyle='--', linewidth=2, label='Event Day')
axes[0].axhline(y=0, color='black', linestyle='-', alpha=0.3)
axes[0].set_title('Daily Returns', fontsize=14, fontweight='bold')
axes[0].set_ylabel('Returns')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Cumulative abnormal returns
axes[1].plot(cumulative_returns, linewidth=2)
axes[1].axvline(x=event_day, color='r', linestyle='--', linewidth=2, label='Event Day')
axes[1].set_title('Cumulative Returns', fontsize=14, fontweight='bold')
axes[1].set_xlabel('Time (Days)')
axes[1].set_ylabel('Cumulative Returns')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Chapter Structure

Section 1: Time Series Basics

  • Characteristics of time series data
  • Time series processing in Python (pandas)
  • Data stationarity testing (ADF, KPSS, PP)
  • Differencing and transformation
  • Case: Stationarity analysis of China's GDP growth rate

Section 2: Time Series Decomposition

  • Classical decomposition methods (additive/multiplicative models)
  • STL decomposition (Seasonal-Trend Decomposition using Loess)
  • Seasonal adjustment (X-13ARIMA-SEATS)
  • Trend extraction and filtering (HP filter, BK filter)
  • Case: Seasonal adjustment of Consumer Price Index (CPI)

Section 3: ARIMA Models

  • AR, MA, ARMA models
  • Unit root and differencing
  • ARIMA model identification, estimation, diagnosis
  • Model selection (AIC, BIC)
  • Forecasting and forecast intervals
  • SARIMA (Seasonal ARIMA)
  • Case: Unemployment rate forecasting

Section 4: Event Study Methodology

  • Basic framework of event studies
  • Normal return models (market model, mean-adjusted model)
  • Calculation of abnormal returns
  • Significance testing (t-test, rank test)
  • Multiple event studies
  • Case: Impact of merger announcements on stock prices

Section 5: Summary and Review

  • Knowledge system summary
  • 10 high-difficulty programming exercises
  • Classic literature recommendations
  • Learning path guide

Python Toolkits

Core Libraries

LibraryMain FunctionsInstallation
pandasTime series data processingpip install pandas
statsmodelsARIMA, VAR, cointegrationpip install statsmodels
scipyStatistical testingpip install scipy
matplotlibVisualizationpip install matplotlib
seabornAdvanced visualizationpip install seaborn

Specialized Libraries

LibraryMain FunctionsInstallation
archGARCH modelspip install arch
pmdarimaAuto ARIMApip install pmdarima
prophetFacebook forecasting toolpip install prophet
rupturesBreakpoint detectionpip install ruptures

Basic Setup

python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
from statsmodels.tsa.stattools import adfuller, kpss, acf, pacf
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

# Chinese font settings
plt.rcParams['font.sans-serif'] = ['Arial Unicode MS']  # macOS
# plt.rcParams['font.sans-serif'] = ['SimHei']  # Windows
plt.rcParams['axes.unicode_minus'] = False

# Set style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['figure.dpi'] = 100

# pandas display settings
pd.set_option('display.max_rows', 100)
pd.set_option('display.max_columns', 20)
pd.set_option('display.width', 1000)

Classic Literature

Must-Read Papers

  1. Box, G. E. P., & Jenkins, G. M. (1970). Time Series Analysis: Forecasting and Control

    • Foundational work on time series analysis
    • Systematic introduction to ARIMA models
  2. Dickey, D. A., & Fuller, W. A. (1979). "Distribution of the Estimators for Autoregressive Time Series with a Unit Root"

    • ADF unit root test
    • Standard method for stationarity testing
  3. Fama, E. F., Fisher, L., Jensen, M. C., & Roll, R. (1969). "The Adjustment of Stock Prices to New Information"

    • Seminal work on event study methodology
    • Empirical support for efficient market hypothesis
  4. Engle, R. F., & Granger, C. W. J. (1987). "Co-integration and Error Correction"

    • Cointegration theory (Nobel Prize in Economics)
    • Long-run equilibrium relationships

Applied Literature

  1. Card, D., & Krueger, A. B. (1994). "Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania"

    • Difference-in-differences (DID) method
    • Natural experiment design
  2. Bertrand, M., Duflo, E., & Mullainathan, S. (2004). "How Much Should We Trust Differences-In-Differences Estimates?"

    • Serial correlation issues in DID
    • Clustered standard errors

Learning Path Recommendations

Beginner (1-2 weeks)

Goal: Master basic time series data processing

Learning Content:

  • Section 1: Stationarity concepts, ADF test
  • First half of Section 2: Time series decomposition
  • Practice: Using real data (GDP, CPI)

Recommended Resources:

  • Wooldridge (2020): Chapter 10 "Basic Regression Analysis with Time Series Data"

Intermediate Learning (3-4 weeks)

Goal: Independently build ARIMA models

Learning Content:

  • Second half of Section 2: STL decomposition
  • Section 3: Complete ARIMA modeling process
  • Practice: Forecasting unemployment rate, inflation rate

Recommended Resources:

  • Stock & Watson (2020): Chapter 14 "Introduction to Time Series Regression"

Advanced Application (5-6 weeks)

Goal: Implement event studies and VAR analysis

Learning Content:

  • Section 4: Event study methodology
  • Practice: Policy effect evaluation

Recommended Resources:

  • Hamilton (1994): Time Series Analysis (Classic textbook)
  • Lütkepohl (2005): New Introduction to Multiple Time Series Analysis

Practical Recommendations

Common Pitfalls in Time Series Analysis

  1. Spurious Regression

    • Problem: Two non-stationary series may show high correlation but are actually unrelated
    • Solution: Test for stationarity, use differencing or cointegration
  2. Over-differencing

    • Problem: Unnecessary differencing introduces MA components
    • Solution: Use joint ADF and KPSS testing
  3. Ignoring Structural Breaks

    • Problem: Policy changes, crises lead to parameter changes
    • Solution: Chow Test, Bai-Perron breakpoint testing
  4. Small Sample Issues

    • Problem: Time series typically have small sample sizes
    • Solution: Interpret cautiously, use robust standard errors

Data Quality Checklist

  • [ ] Check for missing values (interpolation vs deletion)
  • [ ] Identify outliers (extreme events vs measurement errors)
  • [ ] Confirm time unit and frequency (daily, monthly, quarterly)
  • [ ] Check for seasonality (holiday effects)
  • [ ] Plot time series (trend, breakpoints)
  • [ ] Calculate basic statistics (mean, variance, skewness, kurtosis)

Get Started

Ready to explore the mysteries of the time dimension?

Let's begin with Section 1: Time Series Basics!

Remember:

"The best way to predict the future is to understand the past."


Time Series Analysis: Understanding the past, predicting the future, evaluating causality!

Released under the MIT License. Content © Author.