8.3 Fixed Effects Models
The Power of Differencing: The Ultimate Weapon for Eliminating Unobserved Heterogeneity
Section Objectives
- Deeply understand the mathematical principles of Fixed Effects (FE) models
- Master three FE estimation methods (within transformation, LSDV, first difference)
- Distinguish between one-way FE and two-way FE
- Use linearmodels.PanelOLS for FE regression
- Understand FE identification assumptions and causal interpretation
- Handle common pitfalls with FE (time-invariant variables, bad controls)
- Complete case study: Wage determinants (Mincer equation)
Core Idea of Fixed Effects Models
The Origin of the Problem: Omitted Variable Bias
Recall the example from the previous section: studying the effect of education on wages
True Model:
Problem:
- (ability) is unobservable, cannot be directly measured
- is correlated with (smart people get more education)
- If is ignored, will be biased
Traditional Solutions:
- Find a "perfect" proxy variable to measure ability → Nearly impossible
- Eliminate correlation through randomized experiments → Costly, impractical
Fixed Effects Solution: Use the time dimension of panel data to eliminate through differencing
FE Core Intuition: Differencing Eliminates Fixed Effects
Suppose we observe the same person for two years:
Year 1:
Year 2:
Difference (Year 2 - Year 1):
The Magic: is eliminated!
Intuition:
- (ability) doesn't change over the two years (it's fixed)
- After differencing, we only use within-individual variation over time
- Change in wage = Change in education × return rate
This is the essence of fixed effects!
Mathematical Expression of Fixed Effects Models
General FE Model
Symbol Definitions:
- : Individual index
- : Time index
- : Dependent variable
- : -th independent variable
- : Individual fixed effect (time-invariant, can be correlated with )
- : Random error term (idiosyncratic error)
Key Assumptions:
Strict Exogeneity:
Fixed effects are time-invariant:
Allow to be correlated with :
Three Estimation Methods for Fixed Effects
Method 1: Within Transformation ⭐ Recommended
Core Idea: Demean each variable
Step 1: Calculate Individual Means
Note: (because doesn't vary over time)
Step 2: Subtract Mean Equation from Original Equation
Original equation:
Mean equation (averaging both sides over time):
Difference (demeaning):
where is the demeaned variable
Key Result: is eliminated!
Step 3: OLS Estimation of Demeaned Equation
This is the Fixed Effects Estimator (Within Estimator)
Python Implementation: Manual Within Transformation
import numpy as np
import pandas as pd
import statsmodels.api as sm
# Simulate data
np.random.seed(42)
data = []
for i in range(1, 101): # 100 individuals
alpha_i = np.random.normal(0, 1) # Individual fixed effect
for t in range(2015, 2020): # 5 years
x = 10 + t - 2015 + np.random.normal(0, 2)
y = 5 + 2 * x + alpha_i + np.random.normal(0, 1)
data.append({'id': i, 'year': t, 'y': y, 'x': x})
df = pd.DataFrame(data)
print("=" * 70)
print("Original Data")
print("=" * 70)
print(df.head(10))
# Manual within transformation
# Step 1: Calculate individual means
df['y_mean'] = df.groupby('id')['y'].transform('mean')
df['x_mean'] = df.groupby('id')['x'].transform('mean')
# Step 2: Demean
df['y_within'] = df['y'] - df['y_mean']
df['x_within'] = df['x'] - df['x_mean']
print("\n" + "=" * 70)
print("Demeaned Data (first 10 rows)")
print("=" * 70)
print(df[['id', 'year', 'y', 'y_within', 'x', 'x_within']].head(10))
# Step 3: OLS regression on demeaned data (no intercept!)
model_within = sm.OLS(df['y_within'], df['x_within']).fit()
print("\n" + "=" * 70)
print("Within Transformation FE Regression Results (manual implementation)")
print("=" * 70)
print(f"Coefficient: {model_within.params['x_within']:.4f}")
print(f"Standard error: {model_within.bse['x_within']:.4f}")
print(f"True parameter: 2.0000")Key Observation:
- After demeaning, each individual's mean becomes 0
- Only within variation remains
- No need for intercept in regression (intercept is 0 after demeaning)
Method 2: Least Squares Dummy Variable (LSDV)
Core Idea: Add a dummy variable for each individual
where if , otherwise 0
Interpretation:
- dummy variables (1st individual as reference group)
- : Individual 's fixed effect difference relative to individual 1
- (if ) or (reference group)
Pros:
- Can estimate each individual's fixed effect
- Equivalent to within transformation (coefficients are identical)
Cons:
- If is large, need to estimate many parameters ()
- Slow computation (but results identical to within transformation)
Python Implementation:
import pandas as pd
import statsmodels.api as sm
# Create dummy variables (N-1)
dummies = pd.get_dummies(df['id'], prefix='id', drop_first=True)
# Merge data
X_lsdv = pd.concat([df[['x']], dummies], axis=1)
X_lsdv = sm.add_constant(X_lsdv)
# OLS regression
model_lsdv = sm.OLS(df['y'], X_lsdv).fit()
print("=" * 70)
print("LSDV Regression Results")
print("=" * 70)
print(f"x coefficient: {model_lsdv.params['x']:.4f}")
print(f"Number of estimated fixed effects: {len(dummies.columns)}")Note:
- LSDV and within transformation produce identical coefficients
- But standard errors may differ slightly (degrees of freedom adjustment)
Method 3: First Difference
Core Idea: Difference adjacent time periods
Abbreviated as:
Difference from Within Transformation:
- Within transformation: (subtract mean of all periods)
- First difference: (subtract previous period's value)
When Are They Equivalent?
- If (only two periods), the two methods are completely equivalent
- If , within transformation is usually more efficient
When to Prefer First Difference?
- If error terms have a random walk
- If independent variables have severe serial correlation
Python Implementation:
# First difference
df_sorted = df.sort_values(['id', 'year'])
df_sorted['y_diff'] = df_sorted.groupby('id')['y'].diff()
df_sorted['x_diff'] = df_sorted.groupby('id')['x'].diff()
# Drop first period (no difference)
df_fd = df_sorted.dropna(subset=['y_diff', 'x_diff'])
# OLS regression (no intercept)
model_fd = sm.OLS(df_fd['y_diff'], df_fd['x_diff']).fit()
print("=" * 70)
print("First Difference FE Regression Results")
print("=" * 70)
print(f"Coefficient: {model_fd.params['x_diff']:.4f}")linearmodels.PanelOLS: Professional Tool
Basic Syntax
from linearmodels.panel import PanelOLS
# Set panel index
df_panel = df.set_index(['id', 'year'])
# Fixed effects regression
model = PanelOLS(
dependent=df_panel['y'],
exog=df_panel[['x1', 'x2']],
entity_effects=True, # Individual fixed effects
time_effects=False # Time fixed effects (optional)
).fit(
cov_type='clustered', # Standard error type
cluster_entity=True # Cluster at individual level
)
print(model)Parameter Details
1. entity_effects: Individual Fixed Effects
# Enable individual fixed effects (recommended)
entity_effects=TrueEquivalent to within transformation for each variable
2. time_effects: Time Fixed Effects
# Enable time fixed effects
time_effects=TrueControls for common time trends across all individuals (macro shocks, policy changes, etc.)
3. cov_type: Standard Error Type
| Option | Meaning | When to Use |
|---|---|---|
'unadjusted' | Classical OLS SE | Teaching only |
'robust' | Heteroskedasticity-robust SE | When heteroskedasticity present |
'clustered' | Clustered SE | Standard choice for panel data ⭐ |
'kernel' | Newey-West SE | Severe time series correlation |
Recommended:
cov_type='clustered',
cluster_entity=True # Cluster at individual levelComplete Example: Wage Determinants
import numpy as np
import pandas as pd
from linearmodels.panel import PanelOLS
import matplotlib.pyplot as plt
import seaborn as sns
plt.rcParams['font.sans-serif'] = ['Arial Unicode MS']
sns.set_style("whitegrid")
# Simulate data: wage panel
np.random.seed(123)
N = 500 # 500 workers
T = 7 # 7 years (2015-2021)
data = []
for i in range(N):
# Individual fixed effect (ability, family background, etc.)
ability = np.random.normal(0, 0.3)
# Initial education level
education_0 = np.random.choice([10, 12, 14, 16])
for t in range(T):
year = 2015 + t
# Education (may increase, e.g., night school)
education = education_0 + (0.1 * t if np.random.rand() < 0.1 else 0)
# Work experience
experience = t
# Union membership (may change over time)
union = 1 if np.random.rand() < 0.3 else 0
# Log wage
# True parameters: education=0.08, experience=0.05, union=0.10
log_wage = (1.5 + 0.08 * education + 0.05 * experience +
0.10 * union + ability + np.random.normal(0, 0.1))
data.append({
'id': i,
'year': year,
'log_wage': log_wage,
'education': education,
'experience': experience,
'union': union,
'ability': ability # Actually unobservable
})
df = pd.DataFrame(data)
print("=" * 70)
print("Data Summary")
print("=" * 70)
print(df[['log_wage', 'education', 'experience', 'union']].describe())
print(f"\nSample size: {len(df)}")
print(f"Number of individuals: {df['id'].nunique()}")
print(f"Number of time periods: {df['year'].nunique()}")
# Set panel index
df_panel = df.set_index(['id', 'year'])
# Method 1: Pooled OLS (biased)
import statsmodels.api as sm
X_pooled = sm.add_constant(df[['education', 'experience', 'union']])
model_pooled = sm.OLS(df['log_wage'], X_pooled).fit()
print("\n" + "=" * 70)
print("Method 1: Pooled OLS (omitting ability → biased)")
print("=" * 70)
print(model_pooled.summary().tables[1])
# Method 2: Fixed effects (unbiased)
model_fe = PanelOLS(
df_panel['log_wage'],
df_panel[['education', 'experience', 'union']],
entity_effects=True
).fit(cov_type='clustered', cluster_entity=True)
print("\n" + "=" * 70)
print("Method 2: Fixed Effects (controlling for ability → unbiased)")
print("=" * 70)
print(model_fe)
# Compare estimates
print("\n" + "=" * 70)
print("Estimation Comparison")
print("=" * 70)
results_table = pd.DataFrame({
'True Parameter': [0.08, 0.05, 0.10],
'Pooled OLS': [model_pooled.params['education'],
model_pooled.params['experience'],
model_pooled.params['union']],
'Fixed Effects': [model_fe.params['education'],
model_fe.params['experience'],
model_fe.params['union']]
}, index=['education', 'experience', 'union'])
print(results_table.round(4))
# Calculate bias
print("\nBias (Estimate - True Value):")
print((results_table[['Pooled OLS', 'Fixed Effects']] - results_table['True Parameter'].values[:, None]).round(4))Output Interpretation:
- Pooled OLS: Overestimates education coefficient (because ability is omitted)
- Fixed Effects: Close to true value (differencing eliminates ability)
One-Way FE vs Two-Way FE
One-Way Fixed Effects
Model:
Controls:
- Individual fixed effects (individual heterogeneity)
Python Implementation:
model_oneway = PanelOLS(y, X, entity_effects=True).fit()Two-Way Fixed Effects
Model:
Controls:
- Individual fixed effects (individual heterogeneity)
- Time fixed effects (macro trends)
Python Implementation:
model_twoway = PanelOLS(y, X,
entity_effects=True,
time_effects=True).fit()When to Use Two-Way FE?
Scenario 1: Common Time Trends Exist
- Example: Business cycles, inflation, technological progress
- These factors affect all individuals but are unrelated to your independent variables
Scenario 2: DID Studies
- Two-way FE is standard practice for DID
- controls for common time trends
Scenario 3: Avoiding Spurious Correlation
- If both and have upward trends, it might be due to common time factors
- Time FE eliminates such spurious correlation
Python Comparison: One-Way vs Two-Way FE
import numpy as np
import pandas as pd
from linearmodels.panel import PanelOLS
# Simulate data: add time trend
np.random.seed(42)
data = []
for i in range(1, 101):
alpha_i = np.random.normal(0, 1)
for t in range(2010, 2020):
# Time trend (affects all individuals)
lambda_t = 0.1 * (t - 2010)
x = 10 + np.random.normal(0, 2)
y = 5 + 2 * x + alpha_i + lambda_t + np.random.normal(0, 1)
data.append({'id': i, 'year': t, 'y': y, 'x': x})
df = pd.DataFrame(data)
df_panel = df.set_index(['id', 'year'])
# One-way FE (control for individuals only)
model_oneway = PanelOLS(df_panel['y'], df_panel[['x']],
entity_effects=True).fit()
# Two-way FE (control for individuals + time)
model_twoway = PanelOLS(df_panel['y'], df_panel[['x']],
entity_effects=True,
time_effects=True).fit()
print("=" * 70)
print("One-Way FE vs Two-Way FE")
print("=" * 70)
print(f"True parameter: 2.0000")
print(f"One-way FE: {model_oneway.params['x']:.4f}")
print(f"Two-way FE: {model_twoway.params['x']:.4f}")Conclusion:
- If time trends exist and are not controlled, one-way FE may be biased
- Two-way FE simultaneously eliminates individual and time effects, more robust
Limitations of Fixed Effects
Limitation 1: Cannot Estimate Time-Invariant Variables
Problem: FE differencing eliminates all time-invariant variables
Examples:
- Gender
- Race
- Birthplace
- Industry (if no job changes)
Why?
- After differencing: (always 0)
- Cannot estimate its coefficient
Solutions:
- Use Random Effects (RE) model (if RE assumption holds)
- Study interaction effects between time-invariant and time-varying variables
- Use Mundlak method (add means of time-varying variables in RE)
Limitation 2: Only Uses Within Variation
Problem: FE discards between variation
Consequence:
- If independent variables have little within variation, FE estimates are inefficient
- Example: Education level rarely changes in the short term
Example:
# Calculate variation proportions
total_var = df['education'].var()
within_var = df.groupby('id')['education'].apply(lambda x: (x - x.mean()).var()).mean()
between_var = df.groupby('id')['education'].mean().var()
print(f"Total variation: {total_var:.2f}")
print(f"Within variation: {within_var:.2f} ({within_var/total_var*100:.1f}%)")
print(f"Between variation: {between_var:.2f} ({between_var/total_var*100:.1f}%)")If within variation < 10%:
- FE estimates will have large standard errors (imprecise)
- Consider using RE (if Hausman test passes)
Limitation 3: Strict Exogeneity Assumption
Assumption:
Meaning:
- Error terms uncorrelated with all periods' independent variables
- Includes not only current , but also past and future
Scenarios Where Violated:
- Feedback Effect: affects
- Simultaneity: and mutually affect each other
- Measurement Error: is incorrectly measured
Solutions:
- Use instrumental variables (IV-FE)
- Use dynamic panel models (Arellano-Bond)
Limitation 4: Bad Control Problem
Problem: Including variables affected by treatment as controls
Example: Studying the effect of education on wages
# Wrong: occupation is a result of education (mediator)
model = PanelOLS(log_wage, pd.concat([education, occupation]), entity_effects=True).fit()Why Wrong?
- Education → Occupation → Wage (causal chain)
- Controlling for occupation blocks part of education's effect
- Estimates the direct effect rather than the total effect
Decision Rule:
- Control: Confounders (simultaneously affect and )
- Don't control: Mediators (), colliders (affected by both and )
Complete Case Study: Mincer Wage Equation
Background
Mincer Equation (1974) is the most classic model in labor economics:
Interpretation:
- : Returns to education (each additional year of education increases wage by )
- : Nonlinear effect of experience (increases then decreases)
Complete Python Implementation
import numpy as np
import pandas as pd
from linearmodels.panel import PanelOLS
from statsmodels.iolib.summary2 import summary_col
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = ['Arial Unicode MS']
# Simulate realistic Mincer data
np.random.seed(2024)
N = 1000 # 1000 workers
T = 10 # 10 years
data = []
for i in range(N):
# Individual fixed effect (ability)
ability = np.random.normal(0, 0.4)
# Initial characteristics
education_0 = np.random.choice([10, 12, 14, 16], p=[0.2, 0.3, 0.3, 0.2])
experience_0 = np.random.randint(0, 10)
for t in range(T):
year = 2010 + t
# Education (may increase)
education = education_0 + (0.5 if (t > 3 and np.random.rand() < 0.05) else 0)
# Experience
experience = experience_0 + t
# Log wage (Mincer equation)
log_wage = (1.8 + 0.08 * education + 0.05 * experience -
0.001 * experience**2 + ability + np.random.normal(0, 0.15))
data.append({
'id': i,
'year': year,
'log_wage': log_wage,
'education': education,
'experience': experience,
'ability': ability
})
df = pd.DataFrame(data)
df['experience_sq'] = df['experience'] ** 2
print("=" * 70)
print("Mincer Wage Equation: Panel Data Analysis")
print("=" * 70)
print(f"Sample size: {len(df):,}")
print(f"Number of individuals: {df['id'].nunique()}")
print(f"Time span: {df['year'].min()} - {df['year'].max()}")
# Set panel index
df_panel = df.set_index(['id', 'year'])
# Model 1: Pooled OLS
import statsmodels.api as sm
X1 = sm.add_constant(df[['education', 'experience', 'experience_sq']])
model1 = sm.OLS(df['log_wage'], X1).fit()
# Model 2: Fixed effects (one-way)
model2 = PanelOLS(df_panel['log_wage'],
df_panel[['education', 'experience', 'experience_sq']],
entity_effects=True).fit(cov_type='clustered',
cluster_entity=True)
# Model 3: Fixed effects (two-way)
model3 = PanelOLS(df_panel['log_wage'],
df_panel[['education', 'experience', 'experience_sq']],
entity_effects=True,
time_effects=True).fit(cov_type='clustered',
cluster_entity=True)
# Compare results
print("\n" + "=" * 70)
print("Regression Results Comparison")
print("=" * 70)
results = summary_col([model1, model2, model3],
stars=True,
float_format='%.4f',
model_names=['Pooled OLS', 'One-Way FE', 'Two-Way FE'],
info_dict={
'N': lambda x: f"{int(x.nobs):,}",
'R²': lambda x: f"{x.rsquared:.3f}" if hasattr(x, 'rsquared') else 'N/A'
})
print(results)
# Calculate returns to education
print("\n" + "=" * 70)
print("Returns to Education Estimates")
print("=" * 70)
print(f"True parameter: 8.0%")
print(f"Pooled OLS: {model1.params['education']*100:.2f}% (overestimate)")
print(f"One-way FE: {model2.params['education']*100:.2f}%")
print(f"Two-way FE: {model3.params['education']*100:.2f}%")
# Visualize: experience-wage curve
experience_range = np.linspace(0, 30, 100)
wage_curve = (model2.params['experience'] * experience_range +
model2.params['experience_sq'] * experience_range**2)
plt.figure(figsize=(10, 6))
plt.plot(experience_range, wage_curve * 100, linewidth=3, color='darkblue')
plt.xlabel('Work Experience (years)', fontweight='bold', fontsize=12)
plt.ylabel('Log Wage Change (%)', fontweight='bold', fontsize=12)
plt.title('Marginal Effect of Experience on Wages (Mincer Equation)', fontweight='bold', fontsize=14)
plt.grid(alpha=0.3)
plt.axhline(0, color='black', linewidth=0.8, linestyle='--')
plt.tight_layout()
plt.show()
# Calculate optimal experience
optimal_exp = -model2.params['experience'] / (2 * model2.params['experience_sq'])
print(f"\nExperience years at peak wage: {optimal_exp:.1f} years")Output Interpretation:
- Pooled OLS: Overestimates returns to education (omits ability)
- One-way FE: Controls for individual heterogeneity, close to true value
- Two-way FE: Further controls for time trends, most robust
- Experience curve: Increases then decreases (inverted U-shape)
Section Summary
Key Points
Essence of FE: Eliminate unobserved individual heterogeneity through differencing
Three Estimation Methods:
- Within transformation ⭐ Most common
- LSDV (dummy variables): Equivalent to within transformation
- First difference: Equivalent when , otherwise within transformation is better
One-Way vs Two-Way FE:
- One-way: Control for individual heterogeneity
- Two-way: Control for both individual + time trends
Advantages of FE:
- Control for unobservables
- Allow to be correlated with
- Powerful tool for causal identification
Limitations of FE:
- Cannot estimate time-invariant variables
- Only uses within variation (efficiency loss)
- Requires strict exogeneity
Practical Tools:
linearmodels.PanelOLS- Clustered standard errors (must use!)
Next Steps
In Section 4: Random Effects Models, we will learn:
- RE model theory and GLS estimation
- Choosing between FE vs RE (Hausman test)
- When RE is better than FE
The power of differencing, the cornerstone of causal inference!