8.4 Random Effects Models

The Trade-off Between Efficiency and Consistency: When is Random Effects Better Than Fixed Effects?

Section Objectives

Understand the mathematical principles of Random Effects (RE) models
Master GLS estimation methods
Distinguish core assumption differences between FE and RE
Implement Hausman test to choose between FE vs RE
Understand RE's efficiency advantage and consistency risk
Use linearmodels.RandomEffects for RE regression
Complete case study: Corporate capital structure determinants

Core Idea of Random Effects

FE vs RE: Key Differences

Fixed Effects (FE):

is a fixed parameter (each individual has its own parameter)
Allows to be correlated with :
Estimation method: differencing to eliminate

Random Effects (RE):

is a random variable,
Assumes is uncorrelated with : ⭐
Estimation method: Generalized Least Squares (GLS)

Why Called "Random" Effects?

Intuition:

FE: is an inherent characteristic of individual (fixed)
RE: is randomly drawn from a population distribution (random)

Statistical Meaning:

FE: consists of parameters to be estimated
RE: doesn't need estimation, only its variance needs estimation

Analogy:

FE: Like a "fixed intercept" model, each individual has its own intercept
RE: Like a "hierarchical model", individual intercepts follow a distribution

Mathematical Expression of Random Effects Model

Standard RE Model

Symbol Definitions:

: Individual random effect (unobservable)
: Random error term
: Both are independent
Key assumption: (exogeneity)

Composite Error Term:

Therefore, the model can be written as:

Variance-Covariance Structure of Composite Error

Variance:

Covariance (same individual, different times):

Intra-class Correlation:

Interpretation:

: Correlation between different time observations of the same individual
: No individual effects, reduces to pooled OLS
: Errors completely determined by individual effects

Random Effects Estimation: GLS

Why Can't We Use OLS?

Problem: Composite error has serial correlation

Errors at different times for the same individual are correlated:
Violates OLS independence assumption

Consequences:

OLS coefficients are still unbiased (if )
But standard errors are biased (underestimated) → -statistics are inflated

Generalized Least Squares (GLS)

Core Idea: Transform data so that transformed errors satisfy OLS assumptions

Step 1: Construct Transformation

Quasi-Demeaning Transformation:

where:

Special Cases:

If (no individual effects): → pooled OLS
If (individual effects dominate): → fixed effects (within transformation)

Intuition:

RE is a weighted average of FE and pooled OLS
Weight depends on the relative importance of individual effects

Step 2: OLS Estimation on Transformed Data

This is the Random Effects Estimator (RE Estimator)

Feasible GLS (FGLS)

Problem: depends on unknown parameters and

Solution: Two-step estimation

Step 1: Estimate variance components
- Run pooled OLS or FE, obtain residuals
- Calculate and
Step 2: Use estimated variances to calculate , perform GLS

Python Implementation: linearmodels automatically performs FGLS

FE vs RE: In-Depth Comparison

Comparison Table

Dimension	Fixed Effects (FE)	Random Effects (RE)
Individual Effect	(fixed parameter)	(random variable)
Core Assumption	Allows correlated with	Requires
Estimation Method	Within transformation	GLS
Variation Used	Within only	Within + Between (Both)
Efficiency	Relatively low (only uses within variation)	Relatively high (uses all variation)
Consistency	Consistent (even if correlated with )	Consistent only when
Time-Invariant Variables	Cannot estimate	Can estimate
Applicable Scenarios	correlated with (endogeneity)	uncorrelated with (exogeneity)

Core Trade-off: Efficiency vs Consistency

Consistency:

As sample size increases, estimator converges to true parameter

Efficiency:

Estimator has smaller variance (smaller standard errors)

Trade-off:

FE: Consistent (even with endogeneity), but less efficient (only uses within variation)
RE: More efficient (uses all variation), but inconsistent if correlated with

Decision Rule:

If : RE is better (more efficient)
If correlated with : FE is better (consistent)
Key: How to determine? → Hausman Test

Hausman Test: Decision Tool for FE vs RE

Logic of Hausman Test

Core Question: Is correlated with ?

Null Hypothesis:

Alternative Hypothesis:

Test Statistic:

where is the number of independent variables

Intuition:

If holds (), both FE and RE are consistent, estimates should be close
If holds ( correlated with ), FE is consistent but RE is not, estimates will differ significantly

Decision Rule:

: Reject → Use FE
: Accept → Use RE

Python Implementation: Hausman Test

python

import numpy as np
import pandas as pd
from linearmodels.panel import PanelOLS, RandomEffects, compare
import matplotlib.pyplot as plt

plt.rcParams['font.sans-serif'] = ['Arial Unicode MS']

# Simulate data: u_i correlated with X (FE should win)
np.random.seed(123)

N = 200
T = 5

data = []
for i in range(N):
    # Individual effect
    u_i = np.random.normal(0, 1)

    for t in range(T):
        # X correlated with u_i (violates RE assumption!)
        x = 10 + 0.5 * u_i + np.random.normal(0, 2)

        # Y
        y = 5 + 2 * x + u_i + np.random.normal(0, 1)

        data.append({'id': i, 'year': 2015 + t, 'y': y, 'x': x})

df = pd.DataFrame(data)
df_panel = df.set_index(['id', 'year'])

# Estimate FE and RE
model_fe = PanelOLS(df_panel['y'], df_panel[['x']],
                    entity_effects=True).fit(cov_type='clustered',
                                             cluster_entity=True)

model_re = RandomEffects(df_panel['y'], df_panel[['x']]).fit()

print("=" * 70)
print("FE vs RE Estimation Results")
print("=" * 70)
print(f"True parameter:  2.0000")
print(f"FE estimate:     {model_fe.params['x']:.4f}")
print(f"RE estimate:     {model_re.params['x']:.4f}")

# Hausman test (manual implementation)
beta_diff = model_fe.params['x'] - model_re.params['x']
var_diff = model_fe.cov['x']['x'] - model_re.cov['x']['x']
hausman_stat = (beta_diff ** 2) / var_diff

from scipy.stats import chi2
p_value = 1 - chi2.cdf(hausman_stat, df=1)

print("\n" + "=" * 70)
print("Hausman Test")
print("=" * 70)
print(f"H statistic:   {hausman_stat:.3f}")
print(f"p-value:       {p_value:.4f}")

if p_value < 0.05:
    print("Conclusion:    Reject H0, should use FE (RE is inconsistent)")
else:
    print("Conclusion:    Accept H0, should use RE (RE is consistent and more efficient)")

# Use linearmodels built-in comparison function
print("\n" + "=" * 70)
print("linearmodels Built-in Comparison")
print("=" * 70)
comparison = compare({'FE': model_fe, 'RE': model_re})
print(comparison)

Output Interpretation:

If : FE and RE differ significantly → Use FE
If : FE and RE don't differ significantly → Use RE (more efficient)

Practical Recommendations

Conservative Strategy (recommended):

Report both FE and RE
Conduct Hausman test
Prioritize FE for main results (because endogeneity is common)

Exception Cases (may prioritize RE):

Education research: students randomly assigned to schools
Medical research: patients randomly assigned to hospitals
Survey research: individuals randomly sampled from population

Economics Research:

Typically use FE (because endogeneity almost always exists)
RE often used for robustness checks

linearmodels.RandomEffects

Basic Syntax

python

from linearmodels.panel import RandomEffects

# Set panel index
df_panel = df.set_index(['id', 'year'])

# Random effects regression
model_re = RandomEffects(
    dependent=df_panel['y'],
    exog=df_panel[['x1', 'x2']]
).fit()

print(model_re)

Complete Example: Corporate Capital Structure

python

import numpy as np
import pandas as pd
from linearmodels.panel import PanelOLS, RandomEffects
from statsmodels.iolib.summary2 import summary_col

# Simulate corporate panel data
np.random.seed(2024)

N = 300  # 300 companies
T = 10   # 10 years

data = []
for i in range(N):
    # Company fixed effect (management style, industry characteristics, etc.)
    company_effect = np.random.normal(0, 0.1)

    # Industry (time-invariant)
    industry = np.random.choice(['Manufacturing', 'Services', 'Technology'],
                                p=[0.4, 0.3, 0.3])

    for t in range(T):
        year = 2010 + t

        # Profitability (ROA)
        roa = 0.05 + company_effect * 0.5 + np.random.normal(0, 0.02)

        # Company size (log(assets))
        log_assets = 10 + 0.1 * t + np.random.normal(0, 0.5)

        # Growth opportunities (Tobin's Q)
        tobins_q = 1.5 + np.random.normal(0, 0.3)

        # Leverage (dependent variable)
        # True parameters: roa=-0.3, log_assets=0.05, tobins_q=-0.1
        leverage = (0.3 - 0.3 * roa + 0.05 * log_assets -
                    0.1 * tobins_q + company_effect + np.random.normal(0, 0.05))

        data.append({
            'company_id': i,
            'year': year,
            'leverage': leverage,
            'roa': roa,
            'log_assets': log_assets,
            'tobins_q': tobins_q,
            'industry': industry
        })

df = pd.DataFrame(data)

# Industry dummies
df = pd.get_dummies(df, columns=['industry'], drop_first=True)

print("=" * 70)
print("Corporate Capital Structure Study")
print("=" * 70)
print(f"Sample size: {len(df):,}")
print(f"Number of companies: {df['company_id'].nunique()}")
print(f"Time span: {df['year'].min()} - {df['year'].max()}")

# Set panel index
df_panel = df.set_index(['company_id', 'year'])

# Model 1: Pooled OLS
import statsmodels.api as sm
X1 = sm.add_constant(df[['roa', 'log_assets', 'tobins_q']])
model_pooled = sm.OLS(df['leverage'], X1).fit()

# Model 2: Fixed effects
model_fe = PanelOLS(df_panel['leverage'],
                    df_panel[['roa', 'log_assets', 'tobins_q']],
                    entity_effects=True).fit(cov_type='clustered',
                                             cluster_entity=True)

# Model 3: Random effects
model_re = RandomEffects(df_panel['leverage'],
                         df_panel[['roa', 'log_assets', 'tobins_q']]).fit()

# Model 4: RE + industry dummies (utilizing RE's advantage of estimating time-invariant variables)
model_re_industry = RandomEffects(
    df_panel['leverage'],
    df_panel[['roa', 'log_assets', 'tobins_q', 'industry_Services', 'industry_Technology']]
).fit()

# Hausman test
from scipy.stats import chi2
beta_diff = model_fe.params - model_re.params
var_diff = model_fe.cov - model_re.cov
hausman_stat = float(beta_diff.T @ np.linalg.inv(var_diff) @ beta_diff)
p_value = 1 - chi2.cdf(hausman_stat, df=len(beta_diff))

print("\n" + "=" * 70)
print("Hausman Test")
print("=" * 70)
print(f"H statistic: {hausman_stat:.3f}")
print(f"p-value:     {p_value:.4f}")
print(f"Conclusion:  {'Use FE' if p_value < 0.05 else 'Use RE'}")

# Compare results
print("\n" + "=" * 70)
print("Regression Results Comparison")
print("=" * 70)

results = summary_col([model_pooled, model_fe, model_re],
                      stars=True,
                      float_format='%.4f',
                      model_names=['Pooled OLS', 'FE', 'RE'],
                      info_dict={
                          'N': lambda x: f"{int(x.nobs):,}"
                      })
print(results)

print("\n" + "=" * 70)
print("RE + Industry Dummies (Utilizing RE's Advantage)")
print("=" * 70)
print(model_re_industry.summary)

# Interpret coefficients
print("\n" + "=" * 70)
print("Economic Interpretation")
print("=" * 70)
print(f"ROA coefficient (FE):    {model_fe.params['roa']:.4f}")
print("  → 1% increase in profitability reduces leverage by {:.2f} percentage points".format(-model_fe.params['roa'] * 100))
print(f"\nlog(assets) coefficient (FE): {model_fe.params['log_assets']:.4f}")
print("  → Company size doubles (log increases by 0.693), leverage increases by {:.2f} percentage points".format(
    model_fe.params['log_assets'] * 0.693 * 100))

Output Interpretation:

Hausman Test: If rejected, use FE; otherwise use RE
RE's Advantage: Can estimate industry dummies (time-invariant)
Economic Meaning:
- Negative ROA coefficient: High-profit companies reduce debt (pecking order theory)
- Positive size coefficient: Large companies easier to obtain debt financing

RE's Advantage Scenarios

Scenario 1: Estimating Time-Invariant Variables

Example: Studying gender wage gap

python

# FE cannot estimate gender (time-invariant)
# model_fe = PanelOLS(log_wage, education + gender, entity_effects=True).fit()
# → gender coefficient cannot be estimated (eliminated by differencing)

# RE can estimate gender
model_re = RandomEffects(log_wage, education + gender).fit()
# → gender coefficient can be estimated

Note: Only when gender is uncorrelated with individual effects, RE estimation is consistent

Scenario 2: Small Within Variation

Example: Studying effect of education on wages (short panel)

If panel time span is short (e.g., 2-3 years), education level barely changes:

FE only uses within variation (almost 0) → large standard errors
RE uses between variation (large) → more precise

Trade-off:

FE: Consistent but imprecise
RE: Precise but possibly inconsistent (if endogeneity exists)

Scenario 3: Random Sampling

Example: Randomly sample 100 schools from nationwide schools

If schools are randomly sampled, school effect unlikely correlated with student characteristics

RE assumption more reasonable
RE estimation more efficient

Contrast:

If studying specific 100 schools (non-random), FE more appropriate

Section Summary

Key Points

Essence of RE:
- Individual effects are random variables, drawn from a distribution
- Core assumption: (exogeneity)
GLS Estimation:
- Quasi-demeaning transformation: depends on variance ratio
- RE is weighted average of FE and pooled OLS
FE vs RE:
- Efficiency: RE > FE (uses all variation)
- Consistency: FE always consistent, RE consistent only when
- Time-invariant variables: FE cannot estimate, RE can
Hausman Test:
- Tests whether correlated with
- : Use FE
- : Use RE
Practical Recommendations:
- Economics research: Prioritize FE (endogeneity common)
- Education/medical research: Consider RE (random sampling)
- Robustness check: Report both FE and RE
RE's Advantage Scenarios:
- Need to estimate time-invariant variables
- Small within variation
- Individual random sampling

Decision Tree

Start
  ↓
Need to estimate time-invariant variables?
  ↓ Yes
  Use RE (if Hausman test passes)
  ↓ No
Estimate FE and RE, conduct Hausman test
  ↓
Hausman test p < 0.05?
  ↓ Yes
  Use FE (RE is inconsistent)
  ↓ No
  Use RE (more efficient)

Next Steps

In Section 5: Advanced Panel Data Topics, we will learn:

Two-way fixed effects (Two-Way FE) detailed explanation
Correct use of clustered standard errors
Dynamic panel models (Arellano-Bond)
Handling unbalanced panels

Wise choice between efficiency and consistency!

8.4 Random Effects Models ​

Section Objectives ​

Core Idea of Random Effects ​

FE vs RE: Key Differences ​

Why Called "Random" Effects? ​

Mathematical Expression of Random Effects Model ​

Standard RE Model ​

Variance-Covariance Structure of Composite Error ​

Random Effects Estimation: GLS ​

Why Can't We Use OLS? ​

Generalized Least Squares (GLS) ​

Step 1: Construct Transformation ​

Step 2: OLS Estimation on Transformed Data ​

Feasible GLS (FGLS) ​

FE vs RE: In-Depth Comparison ​

Comparison Table ​

Core Trade-off: Efficiency vs Consistency ​

Hausman Test: Decision Tool for FE vs RE ​

Logic of Hausman Test ​

Python Implementation: Hausman Test ​

Practical Recommendations ​

linearmodels.RandomEffects ​

Basic Syntax ​

Complete Example: Corporate Capital Structure ​

RE's Advantage Scenarios ​

Scenario 1: Estimating Time-Invariant Variables ​

Scenario 2: Small Within Variation ​

Scenario 3: Random Sampling ​

Section Summary ​

Key Points ​

Decision Tree ​

Next Steps ​

8.4 Random Effects Models

Section Objectives

Core Idea of Random Effects

FE vs RE: Key Differences

Why Called "Random" Effects?

Mathematical Expression of Random Effects Model

Standard RE Model

Variance-Covariance Structure of Composite Error

Random Effects Estimation: GLS

Why Can't We Use OLS?

Generalized Least Squares (GLS)

Step 1: Construct Transformation

Step 2: OLS Estimation on Transformed Data

Feasible GLS (FGLS)

FE vs RE: In-Depth Comparison

Comparison Table

Core Trade-off: Efficiency vs Consistency

Hausman Test: Decision Tool for FE vs RE

Logic of Hausman Test

Python Implementation: Hausman Test

Practical Recommendations

linearmodels.RandomEffects

Basic Syntax

Complete Example: Corporate Capital Structure

RE's Advantage Scenarios

Scenario 1: Estimating Time-Invariant Variables

Scenario 2: Small Within Variation

Scenario 3: Random Sampling

Section Summary

Key Points

Decision Tree

Next Steps