8.3 Fixed Effects Models

The Power of Differencing: The Ultimate Weapon for Eliminating Unobserved Heterogeneity

Section Objectives

Deeply understand the mathematical principles of Fixed Effects (FE) models
Master three FE estimation methods (within transformation, LSDV, first difference)
Distinguish between one-way FE and two-way FE
Use linearmodels.PanelOLS for FE regression
Understand FE identification assumptions and causal interpretation
Handle common pitfalls with FE (time-invariant variables, bad controls)
Complete case study: Wage determinants (Mincer equation)

Core Idea of Fixed Effects Models

The Origin of the Problem: Omitted Variable Bias

Recall the example from the previous section: studying the effect of education on wages

True Model:

Problem:

(ability) is unobservable, cannot be directly measured
is correlated with (smart people get more education)
If is ignored, will be biased

Traditional Solutions:

Find a "perfect" proxy variable to measure ability → Nearly impossible
Eliminate correlation through randomized experiments → Costly, impractical

Fixed Effects Solution: Use the time dimension of panel data to eliminate through differencing

FE Core Intuition: Differencing Eliminates Fixed Effects

Suppose we observe the same person for two years:

Year 1:

Year 2:

Difference (Year 2 - Year 1):

The Magic: is eliminated!

Intuition:

(ability) doesn't change over the two years (it's fixed)
After differencing, we only use within-individual variation over time
Change in wage = Change in education × return rate

This is the essence of fixed effects!

Mathematical Expression of Fixed Effects Models

General FE Model

Symbol Definitions:

: Individual index
: Time index
: Dependent variable
: -th independent variable
: Individual fixed effect (time-invariant, can be correlated with )
: Random error term (idiosyncratic error)

Key Assumptions:

Strict Exogeneity:
Fixed effects are time-invariant:
Allow to be correlated with :

Three Estimation Methods for Fixed Effects

Method 1: Within Transformation ⭐ Recommended

Core Idea: Demean each variable

Step 1: Calculate Individual Means

Note: (because doesn't vary over time)

Step 2: Subtract Mean Equation from Original Equation

Original equation:

Mean equation (averaging both sides over time):

Difference (demeaning):

where is the demeaned variable

Key Result: is eliminated!

Step 3: OLS Estimation of Demeaned Equation

This is the Fixed Effects Estimator (Within Estimator)

Python Implementation: Manual Within Transformation

python

import numpy as np
import pandas as pd
import statsmodels.api as sm

# Simulate data
np.random.seed(42)
data = []
for i in range(1, 101):  # 100 individuals
    alpha_i = np.random.normal(0, 1)  # Individual fixed effect
    for t in range(2015, 2020):  # 5 years
        x = 10 + t - 2015 + np.random.normal(0, 2)
        y = 5 + 2 * x + alpha_i + np.random.normal(0, 1)
        data.append({'id': i, 'year': t, 'y': y, 'x': x})

df = pd.DataFrame(data)

print("=" * 70)
print("Original Data")
print("=" * 70)
print(df.head(10))

# Manual within transformation
# Step 1: Calculate individual means
df['y_mean'] = df.groupby('id')['y'].transform('mean')
df['x_mean'] = df.groupby('id')['x'].transform('mean')

# Step 2: Demean
df['y_within'] = df['y'] - df['y_mean']
df['x_within'] = df['x'] - df['x_mean']

print("\n" + "=" * 70)
print("Demeaned Data (first 10 rows)")
print("=" * 70)
print(df[['id', 'year', 'y', 'y_within', 'x', 'x_within']].head(10))

# Step 3: OLS regression on demeaned data (no intercept!)
model_within = sm.OLS(df['y_within'], df['x_within']).fit()

print("\n" + "=" * 70)
print("Within Transformation FE Regression Results (manual implementation)")
print("=" * 70)
print(f"Coefficient: {model_within.params['x_within']:.4f}")
print(f"Standard error: {model_within.bse['x_within']:.4f}")
print(f"True parameter: 2.0000")

Key Observation:

After demeaning, each individual's mean becomes 0
Only within variation remains
No need for intercept in regression (intercept is 0 after demeaning)

Method 2: Least Squares Dummy Variable (LSDV)

Core Idea: Add a dummy variable for each individual

where if , otherwise 0

Interpretation:

dummy variables (1st individual as reference group)
: Individual 's fixed effect difference relative to individual 1
(if ) or (reference group)

Pros:

Can estimate each individual's fixed effect
Equivalent to within transformation (coefficients are identical)

Cons:

If is large, need to estimate many parameters ()
Slow computation (but results identical to within transformation)

Python Implementation:

python

import pandas as pd
import statsmodels.api as sm

# Create dummy variables (N-1)
dummies = pd.get_dummies(df['id'], prefix='id', drop_first=True)

# Merge data
X_lsdv = pd.concat([df[['x']], dummies], axis=1)
X_lsdv = sm.add_constant(X_lsdv)

# OLS regression
model_lsdv = sm.OLS(df['y'], X_lsdv).fit()

print("=" * 70)
print("LSDV Regression Results")
print("=" * 70)
print(f"x coefficient: {model_lsdv.params['x']:.4f}")
print(f"Number of estimated fixed effects: {len(dummies.columns)}")

Note:

LSDV and within transformation produce identical coefficients
But standard errors may differ slightly (degrees of freedom adjustment)

Method 3: First Difference

Core Idea: Difference adjacent time periods

Abbreviated as:

Difference from Within Transformation:

Within transformation: (subtract mean of all periods)
First difference: (subtract previous period's value)

When Are They Equivalent?

If (only two periods), the two methods are completely equivalent
If , within transformation is usually more efficient

When to Prefer First Difference?

If error terms have a random walk
If independent variables have severe serial correlation

Python Implementation:

python

# First difference
df_sorted = df.sort_values(['id', 'year'])
df_sorted['y_diff'] = df_sorted.groupby('id')['y'].diff()
df_sorted['x_diff'] = df_sorted.groupby('id')['x'].diff()

# Drop first period (no difference)
df_fd = df_sorted.dropna(subset=['y_diff', 'x_diff'])

# OLS regression (no intercept)
model_fd = sm.OLS(df_fd['y_diff'], df_fd['x_diff']).fit()

print("=" * 70)
print("First Difference FE Regression Results")
print("=" * 70)
print(f"Coefficient: {model_fd.params['x_diff']:.4f}")

linearmodels.PanelOLS: Professional Tool

Basic Syntax

python

from linearmodels.panel import PanelOLS

# Set panel index
df_panel = df.set_index(['id', 'year'])

# Fixed effects regression
model = PanelOLS(
    dependent=df_panel['y'],
    exog=df_panel[['x1', 'x2']],
    entity_effects=True,      # Individual fixed effects
    time_effects=False        # Time fixed effects (optional)
).fit(
    cov_type='clustered',     # Standard error type
    cluster_entity=True       # Cluster at individual level
)

print(model)

Parameter Details

1. `entity_effects`: Individual Fixed Effects

python

# Enable individual fixed effects (recommended)
entity_effects=True

Equivalent to within transformation for each variable

2. `time_effects`: Time Fixed Effects

python

# Enable time fixed effects
time_effects=True

Controls for common time trends across all individuals (macro shocks, policy changes, etc.)

3. `cov_type`: Standard Error Type

Option	Meaning	When to Use
`'unadjusted'`	Classical OLS SE	Teaching only
`'robust'`	Heteroskedasticity-robust SE	When heteroskedasticity present
`'clustered'`	Clustered SE	Standard choice for panel data ⭐
`'kernel'`	Newey-West SE	Severe time series correlation

Recommended:

python

cov_type='clustered',
cluster_entity=True  # Cluster at individual level

Complete Example: Wage Determinants

python

import numpy as np
import pandas as pd
from linearmodels.panel import PanelOLS
import matplotlib.pyplot as plt
import seaborn as sns

plt.rcParams['font.sans-serif'] = ['Arial Unicode MS']
sns.set_style("whitegrid")

# Simulate data: wage panel
np.random.seed(123)

N = 500  # 500 workers
T = 7    # 7 years (2015-2021)

data = []
for i in range(N):
    # Individual fixed effect (ability, family background, etc.)
    ability = np.random.normal(0, 0.3)

    # Initial education level
    education_0 = np.random.choice([10, 12, 14, 16])

    for t in range(T):
        year = 2015 + t

        # Education (may increase, e.g., night school)
        education = education_0 + (0.1 * t if np.random.rand() < 0.1 else 0)

        # Work experience
        experience = t

        # Union membership (may change over time)
        union = 1 if np.random.rand() < 0.3 else 0

        # Log wage
        # True parameters: education=0.08, experience=0.05, union=0.10
        log_wage = (1.5 + 0.08 * education + 0.05 * experience +
                    0.10 * union + ability + np.random.normal(0, 0.1))

        data.append({
            'id': i,
            'year': year,
            'log_wage': log_wage,
            'education': education,
            'experience': experience,
            'union': union,
            'ability': ability  # Actually unobservable
        })

df = pd.DataFrame(data)

print("=" * 70)
print("Data Summary")
print("=" * 70)
print(df[['log_wage', 'education', 'experience', 'union']].describe())
print(f"\nSample size: {len(df)}")
print(f"Number of individuals: {df['id'].nunique()}")
print(f"Number of time periods: {df['year'].nunique()}")

# Set panel index
df_panel = df.set_index(['id', 'year'])

# Method 1: Pooled OLS (biased)
import statsmodels.api as sm
X_pooled = sm.add_constant(df[['education', 'experience', 'union']])
model_pooled = sm.OLS(df['log_wage'], X_pooled).fit()

print("\n" + "=" * 70)
print("Method 1: Pooled OLS (omitting ability → biased)")
print("=" * 70)
print(model_pooled.summary().tables[1])

# Method 2: Fixed effects (unbiased)
model_fe = PanelOLS(
    df_panel['log_wage'],
    df_panel[['education', 'experience', 'union']],
    entity_effects=True
).fit(cov_type='clustered', cluster_entity=True)

print("\n" + "=" * 70)
print("Method 2: Fixed Effects (controlling for ability → unbiased)")
print("=" * 70)
print(model_fe)

# Compare estimates
print("\n" + "=" * 70)
print("Estimation Comparison")
print("=" * 70)
results_table = pd.DataFrame({
    'True Parameter': [0.08, 0.05, 0.10],
    'Pooled OLS': [model_pooled.params['education'],
               model_pooled.params['experience'],
               model_pooled.params['union']],
    'Fixed Effects': [model_fe.params['education'],
               model_fe.params['experience'],
               model_fe.params['union']]
}, index=['education', 'experience', 'union'])

print(results_table.round(4))

# Calculate bias
print("\nBias (Estimate - True Value):")
print((results_table[['Pooled OLS', 'Fixed Effects']] - results_table['True Parameter'].values[:, None]).round(4))

Output Interpretation:

Pooled OLS: Overestimates education coefficient (because ability is omitted)
Fixed Effects: Close to true value (differencing eliminates ability)

One-Way FE vs Two-Way FE

One-Way Fixed Effects

Model:

Controls:

Individual fixed effects (individual heterogeneity)

Python Implementation:

python

model_oneway = PanelOLS(y, X, entity_effects=True).fit()

Two-Way Fixed Effects

Model:

Controls:

Individual fixed effects (individual heterogeneity)
Time fixed effects (macro trends)

Python Implementation:

python

model_twoway = PanelOLS(y, X,
                        entity_effects=True,
                        time_effects=True).fit()

When to Use Two-Way FE?

Scenario 1: Common Time Trends Exist

Example: Business cycles, inflation, technological progress
These factors affect all individuals but are unrelated to your independent variables

Scenario 2: DID Studies

Two-way FE is standard practice for DID
controls for common time trends

Scenario 3: Avoiding Spurious Correlation

If both and have upward trends, it might be due to common time factors
Time FE eliminates such spurious correlation

Python Comparison: One-Way vs Two-Way FE

python

import numpy as np
import pandas as pd
from linearmodels.panel import PanelOLS

# Simulate data: add time trend
np.random.seed(42)
data = []
for i in range(1, 101):
    alpha_i = np.random.normal(0, 1)
    for t in range(2010, 2020):
        # Time trend (affects all individuals)
        lambda_t = 0.1 * (t - 2010)

        x = 10 + np.random.normal(0, 2)
        y = 5 + 2 * x + alpha_i + lambda_t + np.random.normal(0, 1)

        data.append({'id': i, 'year': t, 'y': y, 'x': x})

df = pd.DataFrame(data)
df_panel = df.set_index(['id', 'year'])

# One-way FE (control for individuals only)
model_oneway = PanelOLS(df_panel['y'], df_panel[['x']],
                        entity_effects=True).fit()

# Two-way FE (control for individuals + time)
model_twoway = PanelOLS(df_panel['y'], df_panel[['x']],
                        entity_effects=True,
                        time_effects=True).fit()

print("=" * 70)
print("One-Way FE vs Two-Way FE")
print("=" * 70)
print(f"True parameter:  2.0000")
print(f"One-way FE:   {model_oneway.params['x']:.4f}")
print(f"Two-way FE:   {model_twoway.params['x']:.4f}")

Conclusion:

If time trends exist and are not controlled, one-way FE may be biased
Two-way FE simultaneously eliminates individual and time effects, more robust

Limitations of Fixed Effects

Limitation 1: Cannot Estimate Time-Invariant Variables

Problem: FE differencing eliminates all time-invariant variables

Examples:

Gender
Race
Birthplace
Industry (if no job changes)

Why?

After differencing: (always 0)
Cannot estimate its coefficient

Solutions:

Use Random Effects (RE) model (if RE assumption holds)
Study interaction effects between time-invariant and time-varying variables
Use Mundlak method (add means of time-varying variables in RE)

Limitation 2: Only Uses Within Variation

Problem: FE discards between variation

Consequence:

If independent variables have little within variation, FE estimates are inefficient
Example: Education level rarely changes in the short term

Example:

python

# Calculate variation proportions
total_var = df['education'].var()
within_var = df.groupby('id')['education'].apply(lambda x: (x - x.mean()).var()).mean()
between_var = df.groupby('id')['education'].mean().var()

print(f"Total variation:   {total_var:.2f}")
print(f"Within variation: {within_var:.2f} ({within_var/total_var*100:.1f}%)")
print(f"Between variation: {between_var:.2f} ({between_var/total_var*100:.1f}%)")

If within variation < 10%:

FE estimates will have large standard errors (imprecise)
Consider using RE (if Hausman test passes)

Limitation 3: Strict Exogeneity Assumption

Assumption:

Meaning:

Error terms uncorrelated with all periods' independent variables
Includes not only current , but also past and future

Scenarios Where Violated:

Feedback Effect: affects
Simultaneity: and mutually affect each other
Measurement Error: is incorrectly measured

Solutions:

Use instrumental variables (IV-FE)
Use dynamic panel models (Arellano-Bond)

Limitation 4: Bad Control Problem

Problem: Including variables affected by treatment as controls

Example: Studying the effect of education on wages

python

# Wrong: occupation is a result of education (mediator)
model = PanelOLS(log_wage, pd.concat([education, occupation]), entity_effects=True).fit()

Why Wrong?

Education → Occupation → Wage (causal chain)
Controlling for occupation blocks part of education's effect
Estimates the direct effect rather than the total effect

Decision Rule:

Control: Confounders (simultaneously affect and )
Don't control: Mediators (), colliders (affected by both and )

Complete Case Study: Mincer Wage Equation

Background

Mincer Equation (1974) is the most classic model in labor economics:

Interpretation:

: Returns to education (each additional year of education increases wage by )
: Nonlinear effect of experience (increases then decreases)

Complete Python Implementation

python

import numpy as np
import pandas as pd
from linearmodels.panel import PanelOLS
from statsmodels.iolib.summary2 import summary_col
import matplotlib.pyplot as plt

plt.rcParams['font.sans-serif'] = ['Arial Unicode MS']

# Simulate realistic Mincer data
np.random.seed(2024)

N = 1000  # 1000 workers
T = 10    # 10 years

data = []
for i in range(N):
    # Individual fixed effect (ability)
    ability = np.random.normal(0, 0.4)

    # Initial characteristics
    education_0 = np.random.choice([10, 12, 14, 16], p=[0.2, 0.3, 0.3, 0.2])
    experience_0 = np.random.randint(0, 10)

    for t in range(T):
        year = 2010 + t

        # Education (may increase)
        education = education_0 + (0.5 if (t > 3 and np.random.rand() < 0.05) else 0)

        # Experience
        experience = experience_0 + t

        # Log wage (Mincer equation)
        log_wage = (1.8 + 0.08 * education + 0.05 * experience -
                    0.001 * experience**2 + ability + np.random.normal(0, 0.15))

        data.append({
            'id': i,
            'year': year,
            'log_wage': log_wage,
            'education': education,
            'experience': experience,
            'ability': ability
        })

df = pd.DataFrame(data)
df['experience_sq'] = df['experience'] ** 2

print("=" * 70)
print("Mincer Wage Equation: Panel Data Analysis")
print("=" * 70)
print(f"Sample size: {len(df):,}")
print(f"Number of individuals: {df['id'].nunique()}")
print(f"Time span: {df['year'].min()} - {df['year'].max()}")

# Set panel index
df_panel = df.set_index(['id', 'year'])

# Model 1: Pooled OLS
import statsmodels.api as sm
X1 = sm.add_constant(df[['education', 'experience', 'experience_sq']])
model1 = sm.OLS(df['log_wage'], X1).fit()

# Model 2: Fixed effects (one-way)
model2 = PanelOLS(df_panel['log_wage'],
                  df_panel[['education', 'experience', 'experience_sq']],
                  entity_effects=True).fit(cov_type='clustered',
                                           cluster_entity=True)

# Model 3: Fixed effects (two-way)
model3 = PanelOLS(df_panel['log_wage'],
                  df_panel[['education', 'experience', 'experience_sq']],
                  entity_effects=True,
                  time_effects=True).fit(cov_type='clustered',
                                         cluster_entity=True)

# Compare results
print("\n" + "=" * 70)
print("Regression Results Comparison")
print("=" * 70)

results = summary_col([model1, model2, model3],
                      stars=True,
                      float_format='%.4f',
                      model_names=['Pooled OLS', 'One-Way FE', 'Two-Way FE'],
                      info_dict={
                          'N': lambda x: f"{int(x.nobs):,}",
                          'R²': lambda x: f"{x.rsquared:.3f}" if hasattr(x, 'rsquared') else 'N/A'
                      })
print(results)

# Calculate returns to education
print("\n" + "=" * 70)
print("Returns to Education Estimates")
print("=" * 70)
print(f"True parameter:   8.0%")
print(f"Pooled OLS:   {model1.params['education']*100:.2f}%  (overestimate)")
print(f"One-way FE:    {model2.params['education']*100:.2f}%")
print(f"Two-way FE:    {model3.params['education']*100:.2f}%")

# Visualize: experience-wage curve
experience_range = np.linspace(0, 30, 100)
wage_curve = (model2.params['experience'] * experience_range +
              model2.params['experience_sq'] * experience_range**2)

plt.figure(figsize=(10, 6))
plt.plot(experience_range, wage_curve * 100, linewidth=3, color='darkblue')
plt.xlabel('Work Experience (years)', fontweight='bold', fontsize=12)
plt.ylabel('Log Wage Change (%)', fontweight='bold', fontsize=12)
plt.title('Marginal Effect of Experience on Wages (Mincer Equation)', fontweight='bold', fontsize=14)
plt.grid(alpha=0.3)
plt.axhline(0, color='black', linewidth=0.8, linestyle='--')
plt.tight_layout()
plt.show()

# Calculate optimal experience
optimal_exp = -model2.params['experience'] / (2 * model2.params['experience_sq'])
print(f"\nExperience years at peak wage: {optimal_exp:.1f} years")

Output Interpretation:

Pooled OLS: Overestimates returns to education (omits ability)
One-way FE: Controls for individual heterogeneity, close to true value
Two-way FE: Further controls for time trends, most robust
Experience curve: Increases then decreases (inverted U-shape)

Section Summary

Key Points

Essence of FE: Eliminate unobserved individual heterogeneity through differencing
Three Estimation Methods:
- Within transformation ⭐ Most common
- LSDV (dummy variables): Equivalent to within transformation
- First difference: Equivalent when , otherwise within transformation is better
One-Way vs Two-Way FE:
- One-way: Control for individual heterogeneity
- Two-way: Control for both individual + time trends
Advantages of FE:
- Control for unobservables
- Allow to be correlated with
- Powerful tool for causal identification
Limitations of FE:
- Cannot estimate time-invariant variables
- Only uses within variation (efficiency loss)
- Requires strict exogeneity
Practical Tools:
- linearmodels.PanelOLS
- Clustered standard errors (must use!)

Next Steps

In Section 4: Random Effects Models, we will learn:

RE model theory and GLS estimation
Choosing between FE vs RE (Hausman test)
When RE is better than FE

The power of differencing, the cornerstone of causal inference!

8.3 Fixed Effects Models ​

Section Objectives ​

Core Idea of Fixed Effects Models ​

The Origin of the Problem: Omitted Variable Bias ​

FE Core Intuition: Differencing Eliminates Fixed Effects ​

Mathematical Expression of Fixed Effects Models ​

General FE Model ​

Three Estimation Methods for Fixed Effects ​

Method 1: Within Transformation ⭐ Recommended ​

Step 1: Calculate Individual Means ​

Step 2: Subtract Mean Equation from Original Equation ​

Step 3: OLS Estimation of Demeaned Equation ​

Python Implementation: Manual Within Transformation ​

Method 2: Least Squares Dummy Variable (LSDV) ​

Method 3: First Difference ​

linearmodels.PanelOLS: Professional Tool ​

Basic Syntax ​

Parameter Details ​

1. entity_effects: Individual Fixed Effects ​

2. time_effects: Time Fixed Effects ​

3. cov_type: Standard Error Type ​

Complete Example: Wage Determinants ​

One-Way FE vs Two-Way FE ​

One-Way Fixed Effects ​

Two-Way Fixed Effects ​

When to Use Two-Way FE? ​

Python Comparison: One-Way vs Two-Way FE ​

Limitations of Fixed Effects ​

Limitation 1: Cannot Estimate Time-Invariant Variables ​

Limitation 2: Only Uses Within Variation ​

Limitation 3: Strict Exogeneity Assumption ​

Limitation 4: Bad Control Problem ​

Complete Case Study: Mincer Wage Equation ​

Background ​

Complete Python Implementation ​

Section Summary ​

Key Points ​

Next Steps ​

8.3 Fixed Effects Models

Section Objectives

Core Idea of Fixed Effects Models

The Origin of the Problem: Omitted Variable Bias

FE Core Intuition: Differencing Eliminates Fixed Effects

Mathematical Expression of Fixed Effects Models

General FE Model

Three Estimation Methods for Fixed Effects

Method 1: Within Transformation ⭐ Recommended

Step 1: Calculate Individual Means

Step 2: Subtract Mean Equation from Original Equation

Step 3: OLS Estimation of Demeaned Equation

Python Implementation: Manual Within Transformation

Method 2: Least Squares Dummy Variable (LSDV)

Method 3: First Difference

linearmodels.PanelOLS: Professional Tool

Basic Syntax

Parameter Details

1. `entity_effects`: Individual Fixed Effects

2. `time_effects`: Time Fixed Effects

3. `cov_type`: Standard Error Type

Complete Example: Wage Determinants

One-Way FE vs Two-Way FE

One-Way Fixed Effects

Two-Way Fixed Effects

When to Use Two-Way FE?

Python Comparison: One-Way vs Two-Way FE

Limitations of Fixed Effects

Limitation 1: Cannot Estimate Time-Invariant Variables

Limitation 2: Only Uses Within Variation

Limitation 3: Strict Exogeneity Assumption

Limitation 4: Bad Control Problem

Complete Case Study: Mincer Wage Equation

Background

Complete Python Implementation

Section Summary

Key Points

Next Steps