Skip to content

7.2 Time Series Basics

"The only relevant test of the validity of a hypothesis is comparison of prediction with experience."— Milton Friedman, 1976 Nobel Laureate in Economics

Understanding the properties of time series data and stationarity testing

DifficultyImportance


Section Objectives

Upon completing this section, you will be able to:

  • Understand the core characteristics of time series data
  • Process time series data using pandas
  • Master the concept of stationarity and its importance
  • Implement ADF, KPSS, and PP tests
  • Apply differencing and transformation techniques
  • Analyze the stationarity of real economic data

Characteristics of Time Series Data

Definition of Time Series

A time series is a set of observations arranged in chronological order:

Core Characteristics of Time Series

CharacteristicDescriptionExample
Time DependenceObservations are correlatedToday's stock price depends on yesterday's
TrendLong-term upward or downward movementGDP's long-term growth trend
SeasonalityFixed periodic fluctuationsQuarterly cycle in retail sales
CyclicalityNon-fixed periodic fluctuationsBusiness cycles (4-7 years)
RandomnessUnpredictable fluctuationsWhite noise error term

Time Series Processing in Python

Core pandas Time Series Functions

python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Create time series data
dates = pd.date_range(start='2010-01-01', end='2023-12-31', freq='M')
np.random.seed(42)
values = 100 + np.cumsum(np.random.randn(len(dates)) * 2)

ts = pd.Series(values, index=dates)

print(ts.head())
print(f"\nData type: {type(ts.index)}")
print(f"Frequency: {ts.index.freq}")
print(f"Time range: {ts.index.min()} to {ts.index.max()}")

Time Series Slicing and Indexing

python
# Select by year
ts_2015 = ts['2015']

# Select by date range
ts_range = ts['2015-01':'2016-12']

# Filter by condition
ts_high = ts[ts > 110]

# Access specific date
value_jan_2015 = ts['2015-01-31']

print(f"2015 average: {ts_2015.mean():.2f}")
print(f"2015-2016 data points: {len(ts_range)}")

Resampling

python
# Downsample: monthly → quarterly
ts_quarterly = ts.resample('Q').mean()

# Upsample: monthly → daily (forward fill)
ts_daily = ts.resample('D').ffill()

# Downsample: different aggregation functions
ts_q_stats = pd.DataFrame({
    'mean': ts.resample('Q').mean(),
    'std': ts.resample('Q').std(),
    'min': ts.resample('Q').min(),
    'max': ts.resample('Q').max()
})

print("Quarterly statistics:")
print(ts_q_stats.head())

Rolling Windows

python
# Moving average
ts_ma_12 = ts.rolling(window=12).mean()

# Rolling standard deviation
ts_std_12 = ts.rolling(window=12).std()

# Visualization
fig, ax = plt.subplots(figsize=(14, 6))
ax.plot(ts, label='Original Data', alpha=0.6)
ax.plot(ts_ma_12, label='12-Month Moving Average', linewidth=2, color='red')
ax.fill_between(ts.index,
               ts_ma_12 - 2*ts_std_12,
               ts_ma_12 + 2*ts_std_12,
               alpha=0.2, color='red', label='±2σ Interval')
ax.set_title('Time Series with Moving Average', fontsize=14, fontweight='bold')
ax.set_xlabel('Time')
ax.set_ylabel('Value')
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Lags and Differences

python
# Lag operator
ts_lag1 = ts.shift(1)
ts_lag12 = ts.shift(12)

# First-order difference
ts_diff1 = ts.diff(1)

# Seasonal difference
ts_diff12 = ts.diff(12)

# Log difference (approximate growth rate)
ts_log = np.log(ts)
ts_growth = ts_log.diff(1) * 100  # percentage

print(f"Average monthly growth rate: {ts_growth.mean():.3f}%")
print(f"Growth rate standard deviation: {ts_growth.std():.3f}%")

Stationarity

Why is Stationarity Important?

Foundation for Statistical Inference:

  • Non-stationary series: Statistical properties change over time, sample statistics unreliable
  • Stationary series: Statistical properties constant, sample mean/variance are consistent estimators of population parameters

Spurious Regression:

  • Regressing two independent non-stationary series may yield significant but meaningless results
  • Granger & Newbold (1974): "nonsense correlations"

Mathematical Definition of Stationarity

Strictly Stationary:

holds for all and .

Weakly Stationary/Covariance Stationary:

  1. Constant expectation: (constant)
  2. Constant variance: (constant)
  3. Autocovariance depends only on lag:

In practical applications, we typically focus on weak stationarity.

Visual Judgment of Stationarity

python
# Generate stationary and non-stationary series
np.random.seed(123)
n = 300

# Stationary series: AR(1) with |ρ| < 1
rho = 0.7
stationary = np.zeros(n)
stationary[0] = np.random.randn()
for t in range(1, n):
    stationary[t] = rho * stationary[t-1] + np.random.randn()

# Non-stationary series: random walk (unit root)
random_walk = np.cumsum(np.random.randn(n))

# Non-stationary series: deterministic trend
trend = np.arange(n) * 0.05 + np.random.randn(n) * 2

# Visualization
fig, axes = plt.subplots(3, 2, figsize=(14, 10))

# Time series plots
for i, (data, title) in enumerate([(stationary, 'Stationary: AR(1), ρ=0.7'),
                                     (random_walk, 'Non-stationary: Random Walk'),
                                     (trend, 'Non-stationary: Deterministic Trend')]):
    axes[i, 0].plot(data, linewidth=1.5)
    axes[i, 0].set_title(title, fontsize=12, fontweight='bold')
    axes[i, 0].set_ylabel('Value')
    axes[i, 0].grid(True, alpha=0.3)

    # ACF plot
    from statsmodels.graphics.tsaplots import plot_acf
    plot_acf(data, lags=40, ax=axes[i, 1], alpha=0.05)
    axes[i, 1].set_title(f'ACF: {title.split(":")[1]}', fontsize=12)

axes[2, 0].set_xlabel('Time')
axes[2, 1].set_xlabel('Lag')

plt.tight_layout()
plt.show()

Key Observations:

  • ACF of stationary series decays rapidly
  • ACF of random walk decays slowly (close to 1)
  • ACF of trend series also decays slowly

Stationarity Tests

1. ADF Test (Augmented Dickey-Fuller Test)

Null Hypothesis: : Series has a unit root (non-stationary) Alternative Hypothesis: : Series has no unit root (stationary)

Test Equation:

Test (unit root) vs (stationary)

python
from statsmodels.tsa.stattools import adfuller

def adf_test(series, name=''):
    """Perform ADF test and print results"""
    result = adfuller(series, autolag='AIC')

    print(f'\n{"="*60}')
    print(f'ADF Test Results: {name}')
    print(f'{"="*60}')
    print(f'ADF Statistic: {result[0]:.4f}')
    print(f'p-value: {result[1]:.4f}')
    print(f'Lags Used: {result[2]}')
    print(f'Number of Observations: {result[3]}')
    print('Critical Values:')
    for key, value in result[4].items():
        print(f'  {key}: {value:.4f}')

    if result[1] <= 0.05:
        print(f"\nConclusion: Reject null hypothesis (p={result[1]:.4f}) → Series is stationary ✓")
    else:
        print(f"\nConclusion: Cannot reject null hypothesis (p={result[1]:.4f}) → Series is non-stationary ✗")

    return result

# Test stationary and non-stationary series
adf_test(stationary, 'AR(1) Series')
adf_test(random_walk, 'Random Walk')
adf_test(trend, 'Trend Series')

2. KPSS Test (Kwiatkowski-Phillips-Schmidt-Shin Test)

Null Hypothesis: : Series is stationary Alternative Hypothesis: : Series is non-stationary (has a unit root)

⚠️ Note: KPSS has the opposite null hypothesis from ADF!

python
from statsmodels.tsa.stattools import kpss

def kpss_test(series, name='', regression='c'):
    """
    Perform KPSS test
    regression: 'c' (constant), 'ct' (constant+trend)
    """
    result = kpss(series, regression=regression, nlags='auto')

    print(f'\n{"="*60}')
    print(f'KPSS Test Results: {name}')
    print(f'{"="*60}')
    print(f'KPSS Statistic: {result[0]:.4f}')
    print(f'p-value: {result[1]:.4f}')
    print(f'Lags Used: {result[2]}')
    print('Critical Values:')
    for key, value in result[3].items():
        print(f'  {key}: {value:.4f}')

    if result[1] >= 0.05:
        print(f"\nConclusion: Cannot reject null hypothesis (p={result[1]:.4f}) → Series is stationary ✓")
    else:
        print(f"\nConclusion: Reject null hypothesis (p={result[1]:.4f}) → Series is non-stationary ✗")

    return result

# Test
kpss_test(stationary, 'AR(1) Series')
kpss_test(random_walk, 'Random Walk')
kpss_test(trend, 'Trend Series', regression='ct')

3. PP Test (Phillips-Perron Test)

Principle: Similar to ADF, but uses non-parametric methods to handle serial correlation and heteroskedasticity

python
from statsmodels.tsa.stattools import pp_test

def pp_test_custom(series, name=''):
    """Perform PP test"""
    # Note: pp_test is available in statsmodels 0.13.0+
    try:
        result = pp_test(series, lags='auto')

        print(f'\n{"="*60}')
        print(f'PP Test Results: {name}')
        print(f'{"="*60}')
        print(f'PP Statistic: {result[0]:.4f}')
        print(f'p-value: {result[1]:.4f}')
        print('Critical Values:')
        for key, value in result[4].items():
            print(f'  {key}: {value:.4f}')

        if result[1] <= 0.05:
            print(f"\nConclusion: Reject null hypothesis (p={result[1]:.4f}) → Series is stationary ✓")
        else:
            print(f"\nConclusion: Cannot reject null hypothesis (p={result[1]:.4f}) → Series is non-stationary ✗")

        return result
    except AttributeError:
        print(f"\n⚠️ Current statsmodels version does not support pp_test, please upgrade to 0.13.0+")
        print("pip install --upgrade statsmodels")
        return None

pp_test_custom(stationary, 'AR(1) Series')
pp_test_custom(random_walk, 'Random Walk')

Comparison of Test Methods

TestNull HypothesisAdvantageUse Case
ADFNon-stationaryMost common, robustGeneral testing
KPSSStationaryComplements ADFUse jointly with ADF
PPNon-stationaryHandles heteroskedasticityHigh-frequency financial data

Best Practice: Use both ADF and KPSS

ADFKPSSConclusion
Reject H0 (stationary)Do not reject H0 (stationary)✓ Confirmed stationary
Do not reject H0 (non-stationary)Reject H0 (non-stationary)✗ Confirmed non-stationary
Reject H0Reject H0⚠️ Conflicting results, check data
Do not reject H0Do not reject H0⚠️ Uncertain, needs further analysis

Differencing and Transformation

First-Order Differencing

python
# Apply first-order differencing to random walk
rw_diff = pd.Series(random_walk).diff().dropna()

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

axes[0].plot(random_walk)
axes[0].set_title('Original Series: Random Walk (Non-stationary)', fontsize=12, fontweight='bold')
axes[0].set_ylabel('Value')
axes[0].grid(True, alpha=0.3)

axes[1].plot(rw_diff)
axes[1].set_title('After First-Order Differencing (Stationary)', fontsize=12, fontweight='bold')
axes[1].set_ylabel('Δy')
axes[1].axhline(y=0, color='r', linestyle='--', alpha=0.5)
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Test stationarity after differencing
adf_test(rw_diff, 'Random Walk (After First-Order Differencing)')

Seasonal Differencing

where is the seasonal period (e.g., monthly data , quarterly data )

python
# Generate seasonal data
dates = pd.date_range('2010-01', periods=120, freq='M')
seasonal = 10 + 5*np.sin(2*np.pi*np.arange(120)/12) + np.random.randn(120)
trend_seasonal = seasonal + np.arange(120) * 0.1

ts_seasonal = pd.Series(trend_seasonal, index=dates)

# Apply differencing
ts_diff1 = ts_seasonal.diff(1)        # First-order difference (detrend)
ts_diff12 = ts_seasonal.diff(12)      # Seasonal difference (deseasonalize)
ts_diff1_12 = ts_seasonal.diff(1).diff(12)  # Combined differencing

fig, axes = plt.subplots(2, 2, figsize=(14, 10))

axes[0, 0].plot(ts_seasonal)
axes[0, 0].set_title('Original Series (Trend + Seasonality)', fontsize=12, fontweight='bold')

axes[0, 1].plot(ts_diff1)
axes[0, 1].set_title('First-Order Difference (Detrended)', fontsize=12, fontweight='bold')
axes[0, 1].axhline(y=0, color='r', linestyle='--', alpha=0.5)

axes[1, 0].plot(ts_diff12)
axes[1, 0].set_title('Seasonal Difference (Deseasonalized)', fontsize=12, fontweight='bold')
axes[1, 0].axhline(y=0, color='r', linestyle='--', alpha=0.5)

axes[1, 1].plot(ts_diff1_12.dropna())
axes[1, 1].set_title('Combined Differencing (Detrended + Deseasonalized)', fontsize=12, fontweight='bold')
axes[1, 1].axhline(y=0, color='r', linestyle='--', alpha=0.5)

for ax in axes.flat:
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Log Transformation

Uses:

  1. Stabilize variance (log variance approximately constant)
  2. Convert multiplicative model to additive model
  3. Approximate growth rate calculation
python
# Generate exponentially growing data (heteroskedastic)
exp_data = 100 * np.exp(0.05 * np.arange(100)) + np.random.randn(100) * np.arange(100) * 0.5

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

axes[0].plot(exp_data)
axes[0].set_title('Original Data (Heteroskedastic)', fontsize=12, fontweight='bold')
axes[0].set_ylabel('Value')

axes[1].plot(np.log(exp_data))
axes[1].set_title('After Log Transformation (Variance Stabilized)', fontsize=12, fontweight='bold')
axes[1].set_ylabel('log(Value)')

for ax in axes:
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Calculate growth rate
growth_rate = pd.Series(np.log(exp_data)).diff() * 100
print(f"Average growth rate: {growth_rate.mean():.2f}%")
print(f"Growth rate standard deviation: {growth_rate.std():.2f}%")

Box-Cox Transformation

Automatically find optimal power transformation parameter:

python
from scipy.stats import boxcox

# Box-Cox transformation
data_positive = exp_data - exp_data.min() + 1  # Ensure positive values
transformed, lambda_opt = boxcox(data_positive)

print(f"Optimal λ parameter: {lambda_opt:.4f}")

fig, axes = plt.subplots(1, 3, figsize=(16, 5))

axes[0].plot(data_positive)
axes[0].set_title('Original Data', fontsize=12, fontweight='bold')

axes[1].plot(transformed)
axes[1].set_title(f'Box-Cox Transformation (λ={lambda_opt:.3f})', fontsize=12, fontweight='bold')

axes[2].hist(data_positive, bins=30, alpha=0.5, label='Original', density=True)
axes[2].hist(transformed, bins=30, alpha=0.5, label='Transformed', density=True)
axes[2].set_title('Distribution Comparison', fontsize=12, fontweight='bold')
axes[2].legend()

for ax in axes:
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Complete Case Study: China GDP Growth Rate Stationarity Analysis

python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller, kpss
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

# Simulate China GDP data (1978-2023, in 100 million yuan)
np.random.seed(2024)
years = pd.date_range('1978', '2023', freq='Y')
n = len(years)

# GDP level: exponential growth + cycle + random disturbance
t = np.arange(n)
log_gdp = 8.5 + 0.12*t - 0.003*t**2 + 0.2*np.sin(2*np.pi*t/10) + np.random.randn(n)*0.05
gdp = np.exp(log_gdp)

# GDP growth rate
gdp_growth = pd.Series(log_gdp).diff() * 100
gdp_growth = gdp_growth.dropna()

df_gdp = pd.DataFrame({
    'year': years,
    'gdp': gdp,
    'log_gdp': log_gdp,
    'growth_rate': [np.nan] + list(gdp_growth)
})

# 1. Visualization
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# GDP level
axes[0, 0].plot(df_gdp['year'], df_gdp['gdp'], linewidth=2)
axes[0, 0].set_title('China GDP Level (1978-2023)', fontsize=12, fontweight='bold')
axes[0, 0].set_ylabel('GDP (100 million yuan)')
axes[0, 0].grid(True, alpha=0.3)

# log(GDP)
axes[0, 1].plot(df_gdp['year'], df_gdp['log_gdp'], linewidth=2, color='orange')
axes[0, 1].set_title('log(GDP)', fontsize=12, fontweight='bold')
axes[0, 1].set_ylabel('log(GDP)')
axes[0, 1].grid(True, alpha=0.3)

# GDP growth rate
axes[1, 0].plot(df_gdp['year'][1:], df_gdp['growth_rate'][1:], linewidth=2, color='green')
axes[1, 0].axhline(y=df_gdp['growth_rate'].mean(), color='r', linestyle='--',
                  label=f'Average: {df_gdp["growth_rate"].mean():.2f}%')
axes[1, 0].set_title('GDP Growth Rate', fontsize=12, fontweight='bold')
axes[1, 0].set_ylabel('Growth Rate (%)')
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)

# Growth rate distribution
axes[1, 1].hist(df_gdp['growth_rate'].dropna(), bins=15, edgecolor='black', alpha=0.7)
axes[1, 1].axvline(x=df_gdp['growth_rate'].mean(), color='r', linestyle='--', linewidth=2)
axes[1, 1].set_title('Growth Rate Distribution', fontsize=12, fontweight='bold')
axes[1, 1].set_xlabel('Growth Rate (%)')
axes[1, 1].set_ylabel('Frequency')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# 2. Descriptive statistics
print("\n" + "="*60)
print("GDP Growth Rate Descriptive Statistics")
print("="*60)
print(df_gdp['growth_rate'].describe())

# 3. ACF/PACF plots
fig, axes = plt.subplots(3, 2, figsize=(14, 12))

# GDP level
plot_acf(df_gdp['gdp'].dropna(), lags=20, ax=axes[0, 0], alpha=0.05)
axes[0, 0].set_title('ACF: GDP Level', fontsize=12, fontweight='bold')
plot_pacf(df_gdp['gdp'].dropna(), lags=20, ax=axes[0, 1], alpha=0.05)
axes[0, 1].set_title('PACF: GDP Level', fontsize=12, fontweight='bold')

# log(GDP)
plot_acf(df_gdp['log_gdp'].dropna(), lags=20, ax=axes[1, 0], alpha=0.05)
axes[1, 0].set_title('ACF: log(GDP)', fontsize=12, fontweight='bold')
plot_pacf(df_gdp['log_gdp'].dropna(), lags=20, ax=axes[1, 1], alpha=0.05)
axes[1, 1].set_title('PACF: log(GDP)', fontsize=12, fontweight='bold')

# GDP growth rate
plot_acf(df_gdp['growth_rate'].dropna(), lags=20, ax=axes[2, 0], alpha=0.05)
axes[2, 0].set_title('ACF: GDP Growth Rate', fontsize=12, fontweight='bold')
plot_pacf(df_gdp['growth_rate'].dropna(), lags=20, ax=axes[2, 1], alpha=0.05)
axes[2, 1].set_title('PACF: GDP Growth Rate', fontsize=12, fontweight='bold')

plt.tight_layout()
plt.show()

# 4. Stationarity tests
def comprehensive_stationarity_test(series, name):
    """Comprehensive stationarity test"""
    print(f"\n{'='*70}")
    print(f"Stationarity Test: {name}")
    print(f"{'='*70}")

    # ADF test
    adf_result = adfuller(series.dropna(), autolag='AIC')
    print(f"\n【ADF Test】(H0: Non-stationary)")
    print(f"  Statistic: {adf_result[0]:.4f}")
    print(f"  p-value: {adf_result[1]:.4f}")
    print(f"  Critical Value (5%): {adf_result[4]['5%']:.4f}")
    adf_conclusion = "Stationary ✓" if adf_result[1] < 0.05 else "Non-stationary ✗"
    print(f"  Conclusion: {adf_conclusion}")

    # KPSS test
    kpss_result = kpss(series.dropna(), regression='c', nlags='auto')
    print(f"\n【KPSS Test】(H0: Stationary)")
    print(f"  Statistic: {kpss_result[0]:.4f}")
    print(f"  p-value: {kpss_result[1]:.4f}")
    print(f"  Critical Value (5%): {kpss_result[3]['5%']:.4f}")
    kpss_conclusion = "Stationary ✓" if kpss_result[1] > 0.05 else "Non-stationary ✗"
    print(f"  Conclusion: {kpss_conclusion}")

    # Combined judgment
    print(f"\n【Combined Conclusion】")
    if adf_result[1] < 0.05 and kpss_result[1] > 0.05:
        print(f"  Both tests consistently support: Series is stationary ✓")
    elif adf_result[1] >= 0.05 and kpss_result[1] <= 0.05:
        print(f"  Both tests consistently support: Series is non-stationary ✗")
    else:
        print(f"  Test results are conflicting, further analysis needed ⚠️")

# Test GDP level
comprehensive_stationarity_test(df_gdp['gdp'], 'GDP Level')

# Test log(GDP)
comprehensive_stationarity_test(df_gdp['log_gdp'], 'log(GDP)')

# Test GDP growth rate
comprehensive_stationarity_test(df_gdp['growth_rate'], 'GDP Growth Rate')

# 5. Differencing effect comparison
print(f"\n{'='*70}")
print("First-Order Differencing Effect")
print(f"{'='*70}")

gdp_diff = df_gdp['gdp'].diff().dropna()
comprehensive_stationarity_test(gdp_diff, 'GDP Level (After First-Order Differencing)')

# 6. Visualization summary
fig, axes = plt.subplots(3, 1, figsize=(14, 10))

axes[0].plot(df_gdp['year'], df_gdp['gdp'])
axes[0].set_title('GDP Level (Non-stationary)', fontsize=12, fontweight='bold')
axes[0].set_ylabel('GDP (100 million yuan)')
axes[0].grid(True, alpha=0.3)

axes[1].plot(df_gdp['year'][1:], df_gdp['growth_rate'][1:], color='green')
axes[1].axhline(y=0, color='black', linestyle='-', linewidth=0.8)
axes[1].set_title('GDP Growth Rate (Stationary)', fontsize=12, fontweight='bold')
axes[1].set_ylabel('Growth Rate (%)')
axes[1].grid(True, alpha=0.3)

axes[2].plot(df_gdp['year'][1:], gdp_diff, color='orange')
axes[2].axhline(y=0, color='black', linestyle='-', linewidth=0.8)
axes[2].set_title('GDP First-Order Difference (Stationary)', fontsize=12, fontweight='bold')
axes[2].set_ylabel('Δ GDP')
axes[2].set_xlabel('Year')
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\n" + "="*70)
print("✓ Case analysis complete!")
print("="*70)
print("\nKey findings:")
print("  1. GDP level is I(1) series (integrated of order 1), non-stationary")
print("  2. log(GDP) is also I(1) series, non-stationary")
print("  3. GDP growth rate is I(0) series (stationary)")
print("  4. First-order differencing makes GDP level stationary")
print("\nPolicy implications:")
print("  - Analyzing GDP growth rate (rather than level) is more reliable")
print("  - Regression analysis requires differencing or cointegration testing")
print("  - Economic shocks' impact on growth rate is temporary")

Section Summary

Core Concepts

ConceptDefinitionImportance
StationarityStatistical properties don't change over timeFoundation for statistical inference
Unit RootAR(1) coefficient=1, series non-stationaryTest for non-stationarity
DifferencingΔy = y_t - y_Convert I(1) to I(0)
Log TransformationStabilize variance, approximate growth rateHandle exponential growth data

Practice Checklist

  • [ ] Plot time series, observe trend/seasonality
  • [ ] Plot ACF/PACF, check autocorrelation structure
  • [ ] Use both ADF and KPSS tests for stationarity
  • [ ] If non-stationary, try differencing or log transformation
  • [ ] Verify stationarity of transformed series
  • [ ] Record transformation steps (needed for inverse transformation in modeling)

Common Errors

Avoid:

  • Using only ADF test (may reach wrong conclusions)
  • Over-differencing (e.g., differencing I(1) series twice)
  • Ignoring seasonal patterns
  • Differencing already stationary series

Recommended:

  • ADF + KPSS dual testing
  • Visual check of transformation effects
  • Understand economic meaning of data (e.g., growth rates typically stationary)

Next Section Preview

In the next section, we will learn how to decompose trend, seasonal, and random components of time series.

Master the foundations of time series!


Extended Reading

  1. Dickey, D. A., & Fuller, W. A. (1979). "Distribution of the estimators for autoregressive time series with a unit root." Journal of the American Statistical Association, 74(366a), 427-431.

  2. Kwiatkowski, D., et al. (1992). "Testing the null hypothesis of stationarity against the alternative of a unit root." Journal of Econometrics, 54(1-3), 159-178.

  3. Hamilton, J. D. (1994). Time Series Analysis. Princeton University Press. (Chapters 15-17)

  4. Enders, W. (2014). Applied Econometric Time Series (4th ed.). Wiley. (Chapter 4)

Released under the MIT License. Content © Author.