Skip to content

8.3 Fixed Effects Models

The Power of Differencing: The Ultimate Weapon for Eliminating Unobserved Heterogeneity

DifficultyImportance


Section Objectives

  • Deeply understand the mathematical principles of Fixed Effects (FE) models
  • Master three FE estimation methods (within transformation, LSDV, first difference)
  • Distinguish between one-way FE and two-way FE
  • Use linearmodels.PanelOLS for FE regression
  • Understand FE identification assumptions and causal interpretation
  • Handle common pitfalls with FE (time-invariant variables, bad controls)
  • Complete case study: Wage determinants (Mincer equation)

Core Idea of Fixed Effects Models

The Origin of the Problem: Omitted Variable Bias

Recall the example from the previous section: studying the effect of education on wages

True Model:

Problem:

  • (ability) is unobservable, cannot be directly measured
  • is correlated with (smart people get more education)
  • If is ignored, will be biased

Traditional Solutions:

  1. Find a "perfect" proxy variable to measure ability → Nearly impossible
  2. Eliminate correlation through randomized experiments → Costly, impractical

Fixed Effects Solution: Use the time dimension of panel data to eliminate through differencing


FE Core Intuition: Differencing Eliminates Fixed Effects

Suppose we observe the same person for two years:

Year 1:

Year 2:

Difference (Year 2 - Year 1):

The Magic: is eliminated!

Intuition:

  • (ability) doesn't change over the two years (it's fixed)
  • After differencing, we only use within-individual variation over time
  • Change in wage = Change in education × return rate

This is the essence of fixed effects!


Mathematical Expression of Fixed Effects Models

General FE Model

Symbol Definitions:

  • : Individual index
  • : Time index
  • : Dependent variable
  • : -th independent variable
  • : Individual fixed effect (time-invariant, can be correlated with )
  • : Random error term (idiosyncratic error)

Key Assumptions:

  1. Strict Exogeneity:

  2. Fixed effects are time-invariant:

  3. Allow to be correlated with :


Three Estimation Methods for Fixed Effects

Core Idea: Demean each variable

Step 1: Calculate Individual Means

Note: (because doesn't vary over time)

Step 2: Subtract Mean Equation from Original Equation

Original equation:

Mean equation (averaging both sides over time):

Difference (demeaning):

where is the demeaned variable

Key Result: is eliminated!

Step 3: OLS Estimation of Demeaned Equation

This is the Fixed Effects Estimator (Within Estimator)


Python Implementation: Manual Within Transformation

python
import numpy as np
import pandas as pd
import statsmodels.api as sm

# Simulate data
np.random.seed(42)
data = []
for i in range(1, 101):  # 100 individuals
    alpha_i = np.random.normal(0, 1)  # Individual fixed effect
    for t in range(2015, 2020):  # 5 years
        x = 10 + t - 2015 + np.random.normal(0, 2)
        y = 5 + 2 * x + alpha_i + np.random.normal(0, 1)
        data.append({'id': i, 'year': t, 'y': y, 'x': x})

df = pd.DataFrame(data)

print("=" * 70)
print("Original Data")
print("=" * 70)
print(df.head(10))

# Manual within transformation
# Step 1: Calculate individual means
df['y_mean'] = df.groupby('id')['y'].transform('mean')
df['x_mean'] = df.groupby('id')['x'].transform('mean')

# Step 2: Demean
df['y_within'] = df['y'] - df['y_mean']
df['x_within'] = df['x'] - df['x_mean']

print("\n" + "=" * 70)
print("Demeaned Data (first 10 rows)")
print("=" * 70)
print(df[['id', 'year', 'y', 'y_within', 'x', 'x_within']].head(10))

# Step 3: OLS regression on demeaned data (no intercept!)
model_within = sm.OLS(df['y_within'], df['x_within']).fit()

print("\n" + "=" * 70)
print("Within Transformation FE Regression Results (manual implementation)")
print("=" * 70)
print(f"Coefficient: {model_within.params['x_within']:.4f}")
print(f"Standard error: {model_within.bse['x_within']:.4f}")
print(f"True parameter: 2.0000")

Key Observation:

  • After demeaning, each individual's mean becomes 0
  • Only within variation remains
  • No need for intercept in regression (intercept is 0 after demeaning)

Method 2: Least Squares Dummy Variable (LSDV)

Core Idea: Add a dummy variable for each individual

where if , otherwise 0

Interpretation:

  • dummy variables (1st individual as reference group)
  • : Individual 's fixed effect difference relative to individual 1
  • (if ) or (reference group)

Pros:

  • Can estimate each individual's fixed effect
  • Equivalent to within transformation (coefficients are identical)

Cons:

  • If is large, need to estimate many parameters ()
  • Slow computation (but results identical to within transformation)

Python Implementation:

python
import pandas as pd
import statsmodels.api as sm

# Create dummy variables (N-1)
dummies = pd.get_dummies(df['id'], prefix='id', drop_first=True)

# Merge data
X_lsdv = pd.concat([df[['x']], dummies], axis=1)
X_lsdv = sm.add_constant(X_lsdv)

# OLS regression
model_lsdv = sm.OLS(df['y'], X_lsdv).fit()

print("=" * 70)
print("LSDV Regression Results")
print("=" * 70)
print(f"x coefficient: {model_lsdv.params['x']:.4f}")
print(f"Number of estimated fixed effects: {len(dummies.columns)}")

Note:

  • LSDV and within transformation produce identical coefficients
  • But standard errors may differ slightly (degrees of freedom adjustment)

Method 3: First Difference

Core Idea: Difference adjacent time periods

Abbreviated as:

Difference from Within Transformation:

  • Within transformation: (subtract mean of all periods)
  • First difference: (subtract previous period's value)

When Are They Equivalent?

  • If (only two periods), the two methods are completely equivalent
  • If , within transformation is usually more efficient

When to Prefer First Difference?

  • If error terms have a random walk
  • If independent variables have severe serial correlation

Python Implementation:

python
# First difference
df_sorted = df.sort_values(['id', 'year'])
df_sorted['y_diff'] = df_sorted.groupby('id')['y'].diff()
df_sorted['x_diff'] = df_sorted.groupby('id')['x'].diff()

# Drop first period (no difference)
df_fd = df_sorted.dropna(subset=['y_diff', 'x_diff'])

# OLS regression (no intercept)
model_fd = sm.OLS(df_fd['y_diff'], df_fd['x_diff']).fit()

print("=" * 70)
print("First Difference FE Regression Results")
print("=" * 70)
print(f"Coefficient: {model_fd.params['x_diff']:.4f}")

linearmodels.PanelOLS: Professional Tool

Basic Syntax

python
from linearmodels.panel import PanelOLS

# Set panel index
df_panel = df.set_index(['id', 'year'])

# Fixed effects regression
model = PanelOLS(
    dependent=df_panel['y'],
    exog=df_panel[['x1', 'x2']],
    entity_effects=True,      # Individual fixed effects
    time_effects=False        # Time fixed effects (optional)
).fit(
    cov_type='clustered',     # Standard error type
    cluster_entity=True       # Cluster at individual level
)

print(model)

Parameter Details

1. entity_effects: Individual Fixed Effects

python
# Enable individual fixed effects (recommended)
entity_effects=True

Equivalent to within transformation for each variable

2. time_effects: Time Fixed Effects

python
# Enable time fixed effects
time_effects=True

Controls for common time trends across all individuals (macro shocks, policy changes, etc.)

3. cov_type: Standard Error Type

OptionMeaningWhen to Use
'unadjusted'Classical OLS SETeaching only
'robust'Heteroskedasticity-robust SEWhen heteroskedasticity present
'clustered'Clustered SEStandard choice for panel data ⭐
'kernel'Newey-West SESevere time series correlation

Recommended:

python
cov_type='clustered',
cluster_entity=True  # Cluster at individual level

Complete Example: Wage Determinants

python
import numpy as np
import pandas as pd
from linearmodels.panel import PanelOLS
import matplotlib.pyplot as plt
import seaborn as sns

plt.rcParams['font.sans-serif'] = ['Arial Unicode MS']
sns.set_style("whitegrid")

# Simulate data: wage panel
np.random.seed(123)

N = 500  # 500 workers
T = 7    # 7 years (2015-2021)

data = []
for i in range(N):
    # Individual fixed effect (ability, family background, etc.)
    ability = np.random.normal(0, 0.3)

    # Initial education level
    education_0 = np.random.choice([10, 12, 14, 16])

    for t in range(T):
        year = 2015 + t

        # Education (may increase, e.g., night school)
        education = education_0 + (0.1 * t if np.random.rand() < 0.1 else 0)

        # Work experience
        experience = t

        # Union membership (may change over time)
        union = 1 if np.random.rand() < 0.3 else 0

        # Log wage
        # True parameters: education=0.08, experience=0.05, union=0.10
        log_wage = (1.5 + 0.08 * education + 0.05 * experience +
                    0.10 * union + ability + np.random.normal(0, 0.1))

        data.append({
            'id': i,
            'year': year,
            'log_wage': log_wage,
            'education': education,
            'experience': experience,
            'union': union,
            'ability': ability  # Actually unobservable
        })

df = pd.DataFrame(data)

print("=" * 70)
print("Data Summary")
print("=" * 70)
print(df[['log_wage', 'education', 'experience', 'union']].describe())
print(f"\nSample size: {len(df)}")
print(f"Number of individuals: {df['id'].nunique()}")
print(f"Number of time periods: {df['year'].nunique()}")

# Set panel index
df_panel = df.set_index(['id', 'year'])

# Method 1: Pooled OLS (biased)
import statsmodels.api as sm
X_pooled = sm.add_constant(df[['education', 'experience', 'union']])
model_pooled = sm.OLS(df['log_wage'], X_pooled).fit()

print("\n" + "=" * 70)
print("Method 1: Pooled OLS (omitting ability → biased)")
print("=" * 70)
print(model_pooled.summary().tables[1])

# Method 2: Fixed effects (unbiased)
model_fe = PanelOLS(
    df_panel['log_wage'],
    df_panel[['education', 'experience', 'union']],
    entity_effects=True
).fit(cov_type='clustered', cluster_entity=True)

print("\n" + "=" * 70)
print("Method 2: Fixed Effects (controlling for ability → unbiased)")
print("=" * 70)
print(model_fe)

# Compare estimates
print("\n" + "=" * 70)
print("Estimation Comparison")
print("=" * 70)
results_table = pd.DataFrame({
    'True Parameter': [0.08, 0.05, 0.10],
    'Pooled OLS': [model_pooled.params['education'],
               model_pooled.params['experience'],
               model_pooled.params['union']],
    'Fixed Effects': [model_fe.params['education'],
               model_fe.params['experience'],
               model_fe.params['union']]
}, index=['education', 'experience', 'union'])

print(results_table.round(4))

# Calculate bias
print("\nBias (Estimate - True Value):")
print((results_table[['Pooled OLS', 'Fixed Effects']] - results_table['True Parameter'].values[:, None]).round(4))

Output Interpretation:

  • Pooled OLS: Overestimates education coefficient (because ability is omitted)
  • Fixed Effects: Close to true value (differencing eliminates ability)

One-Way FE vs Two-Way FE

One-Way Fixed Effects

Model:

Controls:

  • Individual fixed effects (individual heterogeneity)

Python Implementation:

python
model_oneway = PanelOLS(y, X, entity_effects=True).fit()

Two-Way Fixed Effects

Model:

Controls:

  • Individual fixed effects (individual heterogeneity)
  • Time fixed effects (macro trends)

Python Implementation:

python
model_twoway = PanelOLS(y, X,
                        entity_effects=True,
                        time_effects=True).fit()

When to Use Two-Way FE?

Scenario 1: Common Time Trends Exist

  • Example: Business cycles, inflation, technological progress
  • These factors affect all individuals but are unrelated to your independent variables

Scenario 2: DID Studies

  • Two-way FE is standard practice for DID
  • controls for common time trends

Scenario 3: Avoiding Spurious Correlation

  • If both and have upward trends, it might be due to common time factors
  • Time FE eliminates such spurious correlation

Python Comparison: One-Way vs Two-Way FE

python
import numpy as np
import pandas as pd
from linearmodels.panel import PanelOLS

# Simulate data: add time trend
np.random.seed(42)
data = []
for i in range(1, 101):
    alpha_i = np.random.normal(0, 1)
    for t in range(2010, 2020):
        # Time trend (affects all individuals)
        lambda_t = 0.1 * (t - 2010)

        x = 10 + np.random.normal(0, 2)
        y = 5 + 2 * x + alpha_i + lambda_t + np.random.normal(0, 1)

        data.append({'id': i, 'year': t, 'y': y, 'x': x})

df = pd.DataFrame(data)
df_panel = df.set_index(['id', 'year'])

# One-way FE (control for individuals only)
model_oneway = PanelOLS(df_panel['y'], df_panel[['x']],
                        entity_effects=True).fit()

# Two-way FE (control for individuals + time)
model_twoway = PanelOLS(df_panel['y'], df_panel[['x']],
                        entity_effects=True,
                        time_effects=True).fit()

print("=" * 70)
print("One-Way FE vs Two-Way FE")
print("=" * 70)
print(f"True parameter:  2.0000")
print(f"One-way FE:   {model_oneway.params['x']:.4f}")
print(f"Two-way FE:   {model_twoway.params['x']:.4f}")

Conclusion:

  • If time trends exist and are not controlled, one-way FE may be biased
  • Two-way FE simultaneously eliminates individual and time effects, more robust

Limitations of Fixed Effects

Limitation 1: Cannot Estimate Time-Invariant Variables

Problem: FE differencing eliminates all time-invariant variables

Examples:

  • Gender
  • Race
  • Birthplace
  • Industry (if no job changes)

Why?

  • After differencing: (always 0)
  • Cannot estimate its coefficient

Solutions:

  1. Use Random Effects (RE) model (if RE assumption holds)
  2. Study interaction effects between time-invariant and time-varying variables
  3. Use Mundlak method (add means of time-varying variables in RE)

Limitation 2: Only Uses Within Variation

Problem: FE discards between variation

Consequence:

  • If independent variables have little within variation, FE estimates are inefficient
  • Example: Education level rarely changes in the short term

Example:

python
# Calculate variation proportions
total_var = df['education'].var()
within_var = df.groupby('id')['education'].apply(lambda x: (x - x.mean()).var()).mean()
between_var = df.groupby('id')['education'].mean().var()

print(f"Total variation:   {total_var:.2f}")
print(f"Within variation: {within_var:.2f} ({within_var/total_var*100:.1f}%)")
print(f"Between variation: {between_var:.2f} ({between_var/total_var*100:.1f}%)")

If within variation < 10%:

  • FE estimates will have large standard errors (imprecise)
  • Consider using RE (if Hausman test passes)

Limitation 3: Strict Exogeneity Assumption

Assumption:

Meaning:

  • Error terms uncorrelated with all periods' independent variables
  • Includes not only current , but also past and future

Scenarios Where Violated:

  • Feedback Effect: affects
  • Simultaneity: and mutually affect each other
  • Measurement Error: is incorrectly measured

Solutions:

  • Use instrumental variables (IV-FE)
  • Use dynamic panel models (Arellano-Bond)

Limitation 4: Bad Control Problem

Problem: Including variables affected by treatment as controls

Example: Studying the effect of education on wages

python
# Wrong: occupation is a result of education (mediator)
model = PanelOLS(log_wage, pd.concat([education, occupation]), entity_effects=True).fit()

Why Wrong?

  • Education → Occupation → Wage (causal chain)
  • Controlling for occupation blocks part of education's effect
  • Estimates the direct effect rather than the total effect

Decision Rule:

  • Control: Confounders (simultaneously affect and )
  • Don't control: Mediators (), colliders (affected by both and )

Complete Case Study: Mincer Wage Equation

Background

Mincer Equation (1974) is the most classic model in labor economics:

Interpretation:

  • : Returns to education (each additional year of education increases wage by )
  • : Nonlinear effect of experience (increases then decreases)

Complete Python Implementation

python
import numpy as np
import pandas as pd
from linearmodels.panel import PanelOLS
from statsmodels.iolib.summary2 import summary_col
import matplotlib.pyplot as plt

plt.rcParams['font.sans-serif'] = ['Arial Unicode MS']

# Simulate realistic Mincer data
np.random.seed(2024)

N = 1000  # 1000 workers
T = 10    # 10 years

data = []
for i in range(N):
    # Individual fixed effect (ability)
    ability = np.random.normal(0, 0.4)

    # Initial characteristics
    education_0 = np.random.choice([10, 12, 14, 16], p=[0.2, 0.3, 0.3, 0.2])
    experience_0 = np.random.randint(0, 10)

    for t in range(T):
        year = 2010 + t

        # Education (may increase)
        education = education_0 + (0.5 if (t > 3 and np.random.rand() < 0.05) else 0)

        # Experience
        experience = experience_0 + t

        # Log wage (Mincer equation)
        log_wage = (1.8 + 0.08 * education + 0.05 * experience -
                    0.001 * experience**2 + ability + np.random.normal(0, 0.15))

        data.append({
            'id': i,
            'year': year,
            'log_wage': log_wage,
            'education': education,
            'experience': experience,
            'ability': ability
        })

df = pd.DataFrame(data)
df['experience_sq'] = df['experience'] ** 2

print("=" * 70)
print("Mincer Wage Equation: Panel Data Analysis")
print("=" * 70)
print(f"Sample size: {len(df):,}")
print(f"Number of individuals: {df['id'].nunique()}")
print(f"Time span: {df['year'].min()} - {df['year'].max()}")

# Set panel index
df_panel = df.set_index(['id', 'year'])

# Model 1: Pooled OLS
import statsmodels.api as sm
X1 = sm.add_constant(df[['education', 'experience', 'experience_sq']])
model1 = sm.OLS(df['log_wage'], X1).fit()

# Model 2: Fixed effects (one-way)
model2 = PanelOLS(df_panel['log_wage'],
                  df_panel[['education', 'experience', 'experience_sq']],
                  entity_effects=True).fit(cov_type='clustered',
                                           cluster_entity=True)

# Model 3: Fixed effects (two-way)
model3 = PanelOLS(df_panel['log_wage'],
                  df_panel[['education', 'experience', 'experience_sq']],
                  entity_effects=True,
                  time_effects=True).fit(cov_type='clustered',
                                         cluster_entity=True)

# Compare results
print("\n" + "=" * 70)
print("Regression Results Comparison")
print("=" * 70)

results = summary_col([model1, model2, model3],
                      stars=True,
                      float_format='%.4f',
                      model_names=['Pooled OLS', 'One-Way FE', 'Two-Way FE'],
                      info_dict={
                          'N': lambda x: f"{int(x.nobs):,}",
                          'R²': lambda x: f"{x.rsquared:.3f}" if hasattr(x, 'rsquared') else 'N/A'
                      })
print(results)

# Calculate returns to education
print("\n" + "=" * 70)
print("Returns to Education Estimates")
print("=" * 70)
print(f"True parameter:   8.0%")
print(f"Pooled OLS:   {model1.params['education']*100:.2f}%  (overestimate)")
print(f"One-way FE:    {model2.params['education']*100:.2f}%")
print(f"Two-way FE:    {model3.params['education']*100:.2f}%")

# Visualize: experience-wage curve
experience_range = np.linspace(0, 30, 100)
wage_curve = (model2.params['experience'] * experience_range +
              model2.params['experience_sq'] * experience_range**2)

plt.figure(figsize=(10, 6))
plt.plot(experience_range, wage_curve * 100, linewidth=3, color='darkblue')
plt.xlabel('Work Experience (years)', fontweight='bold', fontsize=12)
plt.ylabel('Log Wage Change (%)', fontweight='bold', fontsize=12)
plt.title('Marginal Effect of Experience on Wages (Mincer Equation)', fontweight='bold', fontsize=14)
plt.grid(alpha=0.3)
plt.axhline(0, color='black', linewidth=0.8, linestyle='--')
plt.tight_layout()
plt.show()

# Calculate optimal experience
optimal_exp = -model2.params['experience'] / (2 * model2.params['experience_sq'])
print(f"\nExperience years at peak wage: {optimal_exp:.1f} years")

Output Interpretation:

  1. Pooled OLS: Overestimates returns to education (omits ability)
  2. One-way FE: Controls for individual heterogeneity, close to true value
  3. Two-way FE: Further controls for time trends, most robust
  4. Experience curve: Increases then decreases (inverted U-shape)

Section Summary

Key Points

  1. Essence of FE: Eliminate unobserved individual heterogeneity through differencing

  2. Three Estimation Methods:

    • Within transformation ⭐ Most common
    • LSDV (dummy variables): Equivalent to within transformation
    • First difference: Equivalent when , otherwise within transformation is better
  3. One-Way vs Two-Way FE:

    • One-way: Control for individual heterogeneity
    • Two-way: Control for both individual + time trends
  4. Advantages of FE:

    • Control for unobservables
    • Allow to be correlated with
    • Powerful tool for causal identification
  5. Limitations of FE:

    • Cannot estimate time-invariant variables
    • Only uses within variation (efficiency loss)
    • Requires strict exogeneity
  6. Practical Tools:

    • linearmodels.PanelOLS
    • Clustered standard errors (must use!)

Next Steps

In Section 4: Random Effects Models, we will learn:

  • RE model theory and GLS estimation
  • Choosing between FE vs RE (Hausman test)
  • When RE is better than FE

The power of differencing, the cornerstone of causal inference!

Released under the MIT License. Content © Author.