Skip to content

5.6 Interpretation and Reporting

"If you torture the data long enough, it will confess to anything."— Ronald Coase, 1991 Nobel Laureate in Economics

From Regression Output to Academic Publication: Professionally Presenting Your Research

DifficultyImportance


Section Objectives

After completing this section, you will be able to:

  • Correctly interpret coefficients from different model forms
  • Distinguish statistical significance from substantive significance
  • Produce publication-grade regression tables
  • Write standardized regression result reports
  • Visualize regression results
  • Understand the limitations of causal inference

The Art of Coefficient Interpretation

Four Classic Model Forms

Model FormEquationInterpretation of Example
Level-Level increases by 1 unit, increases by unitsEach additional year of education increases wage by 2.5 thousand yuan
Log-Level increases by 1 unit, increases by %Each additional year of education increases wage by 8%
Level-Log increases by 1%, increases by unitsGDP increases by 1%, unemployed population decreases by 0.03 million
Log-Log increases by 1%, increases by % (elasticity)Price increases by 1%, demand decreases by 1.5%

Level-Level Model

python
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt

# Generate data
np.random.seed(42)
n = 200
education = np.random.normal(13, 3, n)
wage = 10 + 2.5 * education + np.random.normal(0, 5, n)

df = pd.DataFrame({'wage': wage, 'education': education})

# Level-Level regression
model_ll = smf.ols('wage ~ education', data=df).fit()
print("Level-Level Model:")
print(model_ll.summary())

# Interpretation
beta_1 = model_ll.params['education']
print(f"\nInterpretation: Each additional year of education increases wage by {beta_1:.2f} thousand yuan/month")

Log-Level Model (Most Commonly Used)

python
# Log-Level regression
df['log_wage'] = np.log(df['wage'])
model_logl = smf.ols('log_wage ~ education', data=df).fit()
print("\nLog-Level Model:")
print(model_logl.summary())

# Interpretation (approximate)
beta_1_log = model_logl.params['education']
print(f"\nApproximate interpretation: Each additional year of education increases wage by approximately {beta_1_log*100:.2f}%")

# Exact interpretation
print(f"Exact interpretation: Each additional year of education increases wage by {(np.exp(beta_1_log)-1)*100:.2f}%")

When to Use Approximate vs Exact:

  • : Approximate and exact are nearly identical
  • : Use exact interpretation

Level-Log Model

python
# Level-Log regression
df['log_education'] = np.log(df['education'])
model_llevl = smf.ols('wage ~ log_education', data=df).fit()
print("\nLevel-Log Model:")
print(model_llevl.summary())

# Interpretation
beta_1_llevl = model_llevl.params['log_education']
print(f"\nInterpretation: Education increases by 1%, wage increases by {beta_1_llevl/100:.4f} thousand yuan")
print(f"Or: Education increases by 10%, wage increases by {beta_1_llevl*0.1:.3f} thousand yuan")

Log-Log Model (Elasticity Model)

python
# Log-Log regression
model_loglog = smf.ols('log_wage ~ log_education', data=df).fit()
print("\nLog-Log Model:")
print(model_loglog.summary())

# Interpretation
elasticity = model_loglog.params['log_education']
print(f"\nInterpretation: Education-wage elasticity = {elasticity:.3f}")
print(f"That is: Education increases by 1%, wage increases by {elasticity:.3f}%")

Visualizing Model Comparisons

python
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# 1. Level-Level
axes[0, 0].scatter(df['education'], df['wage'], alpha=0.5)
axes[0, 0].plot(df['education'], model_ll.fittedvalues, 'r-', linewidth=2)
axes[0, 0].set_xlabel('Education (years)')
axes[0, 0].set_ylabel('Wage (thousands)')
axes[0, 0].set_title('Level-Level: Wage = β₀ + β₁·Education')
axes[0, 0].grid(True, alpha=0.3)

# 2. Log-Level
axes[0, 1].scatter(df['education'], df['log_wage'], alpha=0.5)
axes[0, 1].plot(df['education'], model_logl.fittedvalues, 'r-', linewidth=2)
axes[0, 1].set_xlabel('Education (years)')
axes[0, 1].set_ylabel('log(Wage)')
axes[0, 1].set_title('Log-Level: log(Wage) = β₀ + β₁·Education')
axes[0, 1].grid(True, alpha=0.3)

# 3. Level-Log
axes[1, 0].scatter(df['log_education'], df['wage'], alpha=0.5)
axes[1, 0].plot(df['log_education'], model_llevl.fittedvalues, 'r-', linewidth=2)
axes[1, 0].set_xlabel('log(Education)')
axes[1, 0].set_ylabel('Wage (thousands)')
axes[1, 0].set_title('Level-Log: Wage = β₀ + β₁·log(Education)')
axes[1, 0].grid(True, alpha=0.3)

# 4. Log-Log
axes[1, 1].scatter(df['log_education'], df['log_wage'], alpha=0.5)
axes[1, 1].plot(df['log_education'], model_loglog.fittedvalues, 'r-', linewidth=2)
axes[1, 1].set_xlabel('log(Education)')
axes[1, 1].set_ylabel('log(Wage)')
axes[1, 1].set_title('Log-Log: log(Wage) = β₀ + β₁·log(Education)')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Publication-Grade Regression Tables

Using stargazer (Python Version)

python
from statsmodels.iolib.summary2 import summary_col

# Generate complete data
np.random.seed(123)
n = 500
education = np.random.normal(13, 3, n)
experience = np.random.uniform(0, 30, n)
female = np.random.binomial(1, 0.5, n)
married = np.random.binomial(1, 0.6, n)

log_wage = (1.5 + 0.08*education + 0.03*experience - 0.0005*experience**2
            - 0.15*female + 0.05*married + np.random.normal(0, 0.3, n))

df = pd.DataFrame({
    'log_wage': log_wage,
    'education': education,
    'experience': experience,
    'experience_sq': experience**2,
    'female': female,
    'married': married
})

# Estimate multiple models
model1 = smf.ols('log_wage ~ education', data=df).fit(cov_type='HC3')
model2 = smf.ols('log_wage ~ education + experience + I(experience**2)',
                 data=df).fit(cov_type='HC3')
model3 = smf.ols('log_wage ~ education + experience + I(experience**2) + female',
                 data=df).fit(cov_type='HC3')
model4 = smf.ols('log_wage ~ education + experience + I(experience**2) + female + married',
                 data=df).fit(cov_type='HC3')

# Create comparison table
results_table = summary_col(
    [model1, model2, model3, model4],
    model_names=['(1)', '(2)', '(3)', '(4)'],
    stars=True,
    float_format='%.3f',
    info_dict={
        'N': lambda x: f"{int(x.nobs)}",
        'R²': lambda x: f"{x.rsquared:.3f}",
        'Adj. R²': lambda x: f"{x.rsquared_adj:.3f}"
    }
)

print("Table 1: Wage Determination Equation (Dependent Variable: log(wage))")
print("="*80)
print(results_table)
print("="*80)
print("Note: Robust standard errors (HC3) in parentheses")
print("*** p<0.01, ** p<0.05, * p<0.1")

Output LaTeX Format

python
# Output as LaTeX
latex_table = results_table.as_latex()
print("\nLaTeX code:")
print(latex_table)

# Save to file
with open('regression_table.tex', 'w') as f:
    f.write(latex_table)
print("\nSaved to regression_table.tex")

Custom Table Style

python
# More professional table style
def create_regression_table(models, model_names, dependent_var, note=''):
    """
    Create professional regression table
    """
    results = summary_col(
        models,
        model_names=model_names,
        stars=True,
        float_format='%.4f'
    )

    # Add header and notes
    output = f"\nTable: Regression Analysis of {dependent_var}\n"
    output += "="*90 + "\n"
    output += str(results)
    output += "\n" + "="*90 + "\n"
    output += "Standard errors in parentheses. Robust standard errors (HC3) used.\n"
    output += "*** p<0.01, ** p<0.05, * p<0.1\n"
    if note:
        output += f"\nNote: {note}\n"

    return output

table = create_regression_table(
    [model1, model2, model3, model4],
    ['Model 1', 'Model 2', 'Model 3', 'Model 4'],
    'log(wage)',
    note='Sample includes 500 workers. Models (2)-(4) control for work experience and its square.'
)
print(table)

Writing Regression Result Reports

Complete Report Template

python
def generate_report(model, df, dep_var, title="Regression Analysis Report"):
    """
    Generate complete regression analysis report
    """
    report = f"\n{'='*80}\n"
    report += f"{title:^80}\n"
    report += f"{'='*80}\n\n"

    # 1. Model specification
    report += "1. Model Specification\n"
    report += "-" * 80 + "\n"
    report += f"Dependent variable: {dep_var}\n"
    report += f"Sample size: {int(model.nobs)}\n"
    report += f"Estimation method: OLS (robust standard errors)\n\n"

    # 2. Main findings
    report += "2. Main Findings\n"
    report += "-" * 80 + "\n"
    for var in model.params.index:
        if var == 'Intercept' or var == 'const':
            continue
        coef = model.params[var]
        se = model.bse[var]
        t = model.tvalues[var]
        p = model.pvalues[var]

        # Significance markers
        sig = '***' if p < 0.01 else ('**' if p < 0.05 else ('*' if p < 0.1 else ''))

        report += f"\n{var}:\n"
        report += f"  Coefficient = {coef:.4f}{sig} (SE = {se:.4f})\n"
        report += f"  t statistic = {t:.3f}, p-value = {p:.4f}\n"

        # Interpretation (assuming log-level model)
        if 'log' in dep_var.lower():
            pct_change = (np.exp(coef) - 1) * 100
            report += f"  Interpretation: {var} increases by 1 unit, {dep_var} increases by {pct_change:.2f}%\n"

    # 3. Model fit
    report += "\n3. Model Fit\n"
    report += "-" * 80 + "\n"
    report += f"R² = {model.rsquared:.4f}\n"
    report += f"Adjusted R² = {model.rsquared_adj:.4f}\n"
    report += f"F statistic = {model.fvalue:.2f} (p = {model.f_pvalue:.4f})\n"

    # 4. Diagnostic tests
    report += "\n4. Diagnostic Tests\n"
    report += "-" * 80 + "\n"

    # Heteroskedasticity test
    from statsmodels.stats.diagnostic import het_breuschpagan
    bp_test = het_breuschpagan(model.resid, model.model.exog)
    report += f"Breusch-Pagan test (heteroskedasticity): LM = {bp_test[0]:.3f}, p = {bp_test[1]:.4f}\n"

    # Normality test
    from statsmodels.stats.stattools import jarque_bera
    jb_test = jarque_bera(model.resid)
    report += f"Jarque-Bera test (normality): JB = {jb_test[0]:.3f}, p = {jb_test[1]:.4f}\n"

    # Autocorrelation test
    from statsmodels.stats.stattools import durbin_watson
    dw = durbin_watson(model.resid)
    report += f"Durbin-Watson statistic (autocorrelation): {dw:.3f}\n"

    report += "\n" + "="*80 + "\n"

    return report

# Generate report
report = generate_report(model4, df, 'log(wage)', title="Analysis of Wage Determination Equation")
print(report)

Academic Writing Example

markdown
## Empirical Results

Table 1 reports the estimation results for the wage determination equation. Column (1) shows
a simple regression of education on wage, with an estimated education return of 8.2%,
significant at the 1% level. This means that each additional year of education is
associated with an average wage increase of 8.2%.

Column (2) adds work experience and its square. The coefficient on experience is 0.030
(p < 0.01), and the coefficient on experience squared is -0.0005 (p < 0.01), indicating
that the wage-experience profile follows an inverted U-shape. The peak occurs at
approximately 30 years of experience. After controlling for experience, the education
return slightly decreases to 7.9% but remains highly significant.

Column (3) further controls for gender. The coefficient on female is -0.147 (p < 0.01),
indicating that after controlling for education and experience, female wages are
approximately 13.7% lower than male wages [= (exp(-0.147)-1)×100%]. This significant
gender wage gap may reflect labor market discrimination or unobserved productivity
differences.

Column (4) is the complete model, adding marital status. The coefficient on married is
0.052 (p < 0.05), indicating that married individuals earn approximately 5.3% more than
unmarried individuals. This "marriage premium" is widely documented in the labor economics
literature (Korenman & Neumark, 1991).

All models use HC3 robust standard errors to correct for potential heteroskedasticity.
Adjusted R² increases from 0.427 in model (1) to 0.583 in model (4), indicating that
added variables significantly improve the model's explanatory power.

Visualizing Regression Results

Coefficient Plot

python
# Extract coefficients and confidence intervals
coefs = model4.params.drop('Intercept')
ci = model4.conf_int(alpha=0.05).drop('Intercept')
ci_lower = ci[0]
ci_upper = ci[1]

# Plot
fig, ax = plt.subplots(figsize=(10, 6))
y_pos = np.arange(len(coefs))

ax.errorbar(coefs, y_pos, xerr=[coefs - ci_lower, ci_upper - coefs],
            fmt='o', markersize=8, capsize=5, capthick=2, linewidth=2)
ax.axvline(x=0, color='red', linestyle='--', linewidth=1.5, alpha=0.7)
ax.set_yticks(y_pos)
ax.set_yticklabels(coefs.index)
ax.set_xlabel('Coefficient Estimate')
ax.set_title('Regression Coefficients with 95% Confidence Intervals')
ax.grid(True, alpha=0.3, axis='x')
plt.tight_layout()
plt.show()

Marginal Effects Plot

python
# Marginal effect of experience (considering quadratic term)
def marginal_effect_exp(exp_values, model):
    beta_exp = model.params['experience']
    beta_exp2 = model.params['I(experience ** 2)']
    return beta_exp + 2 * beta_exp2 * exp_values

exp_range = np.linspace(0, 40, 100)
me = marginal_effect_exp(exp_range, model4)

plt.figure(figsize=(10, 6))
plt.plot(exp_range, me, linewidth=2)
plt.axhline(y=0, color='r', linestyle='--', alpha=0.5)
plt.xlabel('Work Experience (years)')
plt.ylabel('Marginal Effect of Experience (on log(wage))')
plt.title('Marginal Effect of Work Experience on Wage')
plt.grid(True, alpha=0.3)

# Mark peak
peak_exp = -model4.params['experience'] / (2 * model4.params['I(experience ** 2)'])
plt.axvline(x=peak_exp, color='green', linestyle=':', alpha=0.7,
           label=f'Peak at experience = {peak_exp:.1f} years')
plt.legend()
plt.show()

Predicted Wage Distribution

python
# Predicted wages for different groups
scenarios = pd.DataFrame({
    'education': [12, 16, 16, 18],
    'experience': [5, 10, 10, 15],
    'female': [0, 0, 1, 0],
    'married': [0, 1, 1, 1],
    'label': ['High school graduate male', 'College male, married', 'College female, married', 'Graduate male, married']
})

# Predict
scenarios['log_wage_pred'] = model4.predict(scenarios)
scenarios['wage_pred'] = np.exp(scenarios['log_wage_pred'])

# Calculate prediction intervals
predictions = model4.get_prediction(scenarios)
pred_summary = predictions.summary_frame(alpha=0.05)
scenarios['ci_lower'] = np.exp(pred_summary['mean_ci_lower'])
scenarios['ci_upper'] = np.exp(pred_summary['mean_ci_upper'])

# Visualize
fig, ax = plt.subplots(figsize=(10, 6))
y_pos = np.arange(len(scenarios))

ax.barh(y_pos, scenarios['wage_pred'], alpha=0.7)
ax.errorbar(scenarios['wage_pred'], y_pos,
           xerr=[scenarios['wage_pred'] - scenarios['ci_lower'],
                 scenarios['ci_upper'] - scenarios['wage_pred']],
           fmt='none', ecolor='black', capsize=5)

ax.set_yticks(y_pos)
ax.set_yticklabels(scenarios['label'])
ax.set_xlabel('Predicted Wage (thousands/month)')
ax.set_title('Predicted Wages for Different Groups with 95% Confidence Intervals')
ax.grid(True, alpha=0.3, axis='x')
plt.tight_layout()
plt.show()

print("Prediction results:")
print(scenarios[['label', 'wage_pred', 'ci_lower', 'ci_upper']])

Statistical Significance vs Substantive Significance

Problem: Misuse of p-values

Common Misconceptions:

  • "p < 0.001, therefore the effect is very large"
  • "p > 0.05, therefore there is no effect"

Correct Understanding:

  • Statistical significance: Strength of evidence that effect is nonzero
  • Substantive significance: Whether the effect size is practically important

Case Study

python
# Simulate large sample data
np.random.seed(999)
n_large = 10000

education_large = np.random.normal(13, 3, n_large)
# True effect is very small: 0.005 (0.5%)
log_wage_large = 2.5 + 0.005*education_large + np.random.normal(0, 0.3, n_large)

df_large = pd.DataFrame({'log_wage': log_wage_large, 'education': education_large})
model_large = smf.ols('log_wage ~ education', data=df_large).fit()

print("Large sample regression:")
print(f"Sample size: {n_large}")
print(f"Education coefficient: {model_large.params['education']:.6f}")
print(f"p-value: {model_large.pvalues['education']:.6f}")
print(f"95% confidence interval: [{model_large.conf_int().loc['education', 0]:.6f}, "
      f"{model_large.conf_int().loc['education', 1]:.6f}]")

# Substantive meaning
effect_pct = model_large.params['education'] * 100
print(f"\nSubstantive interpretation: Each additional year of education increases wage by {effect_pct:.2f}%")
print("Although statistically significant, the actual effect is extremely small (less than 1%), of little substantive importance")

Assessing Substantive Significance

Standards (vary by field):

  • Cohen's d (effect size)
  • R² increment
  • Domain expert judgment
python
# Calculate Cohen's d
def cohens_d(group1, group2):
    n1, n2 = len(group1), len(group2)
    var1, var2 = np.var(group1, ddof=1), np.var(group2, ddof=1)
    pooled_std = np.sqrt(((n1-1)*var1 + (n2-1)*var2) / (n1+n2-2))
    return (np.mean(group1) - np.mean(group2)) / pooled_std

# Case: Gender wage gap
male_wage = df[df['female'] == 0]['log_wage']
female_wage = df[df['female'] == 1]['log_wage']

d = cohens_d(male_wage, female_wage)
print(f"Cohen's d = {d:.3f}")

# Interpretation
if abs(d) < 0.2:
    print("Effect size: Small")
elif abs(d) < 0.5:
    print("Effect size: Medium")
else:
    print("Effect size: Large")

Limitations of Causal Inference

OLS Regression ≠ Causal Effect

Conditions for Causal Inference:

  1. Randomized Controlled Trial (RCT)
  2. Natural Experiment
  3. Instrumental Variables (IV)
  4. Difference-in-Differences (DID)
  5. Regression Discontinuity (RDD)

Limitations of OLS Regression:

  • Omitted variable bias
  • Reverse causality
  • Selection bias

Case: Causal Effect of Education on Wage

Problem:

Sources of Bias:

  1. Omitted variables: Ability

    • High ability → More education
    • High ability → Higher wage
    • biased upward
  2. Reverse causality: Expected wage → Education choice

  3. Measurement error: Quality differences in education

Gold Standard for Causal Inference: Instrumental Variables

python
# Simulate IV estimation
np.random.seed(2024)
n = 1000

# Latent ability (unobservable)
ability = np.random.normal(0, 1, n)

# Instrument: Birth quarter (Angrist & Krueger, 1991)
# Assume those born later attend more school due to compulsory education laws
birth_quarter = np.random.choice([1, 2, 3, 4], n)
instrument = (birth_quarter == 4).astype(int)

# Education (endogenous)
education_iv = 12 + 1.5*ability + 0.5*instrument + np.random.normal(0, 2, n)

# Wage (true causal effect = 0.05)
log_wage_iv = 2.0 + 0.05*education_iv + 0.20*ability + np.random.normal(0, 0.3, n)

df_iv = pd.DataFrame({
    'log_wage': log_wage_iv,
    'education': education_iv,
    'instrument': instrument,
    'ability': ability  # Unobservable in reality
})

# OLS (biased)
model_ols = smf.ols('log_wage ~ education', data=df_iv).fit()
print("OLS estimate (biased):")
print(f"Education coefficient = {model_ols.params['education']:.4f}")

# IV estimation (unbiased)
from linearmodels.iv import IV2SLS
iv_model = IV2SLS.from_formula('log_wage ~ 1 + [education ~ instrument]',
                                data=df_iv).fit()
print("\nIV estimate (unbiased):")
print(f"Education coefficient = {iv_model.params['education']:.4f}")

print(f"\nTrue causal effect: 0.05")
print(f"OLS upward bias: {model_ols.params['education'] - 0.05:.4f}")

Complete Case: Publication-Grade Paper

Research Question

Title: Gender Differences in Returns to Education: Evidence from China's Labor Market

Research Questions:

  1. What is the return to education on wages?
  2. Are there gender differences in returns to education?
  3. How do these differences vary across education levels?

Data and Methods

python
# Generate complete dataset
np.random.seed(20250128)
n = 2000

education = np.random.normal(13, 3, n)
experience = np.random.uniform(0, 30, n)
female = np.random.binomial(1, 0.5, n)
region = np.random.choice(['East', 'Central', 'West'], n, p=[0.4, 0.3, 0.3])
married = np.random.binomial(1, 0.6, n)

# DGP: Gender differences in education returns
region_effects = [{'East': 0.15, 'Central': 0.05, 'West': 0}[r] for r in region]
log_wage = (1.5 + 0.08*education + 0.03*experience - 0.0005*experience**2 +
            0.10*female - 0.015*education*female + np.array(region_effects) +
            0.06*married + np.random.normal(0, 0.3, n))

df_final = pd.DataFrame({
    'log_wage': log_wage,
    'education': education,
    'experience': experience,
    'female': female,
    'region': region,
    'married': married
})

# Descriptive statistics
print("Table 2: Descriptive Statistics")
print("="*80)
desc_stats = df_final.describe().T[['mean', 'std', 'min', 'max']]
print(desc_stats)

# By gender
print("\nBy gender:")
print(df_final.groupby('female')[['education', 'experience', 'log_wage']].mean())

Regression Analysis

python
# Models 1-4
m1 = smf.ols('log_wage ~ education', data=df_final).fit(cov_type='HC3')
m2 = smf.ols('log_wage ~ education + experience + I(experience**2)',
             data=df_final).fit(cov_type='HC3')
m3 = smf.ols('log_wage ~ education + experience + I(experience**2) + female',
             data=df_final).fit(cov_type='HC3')
m4 = smf.ols('log_wage ~ education * female + experience + I(experience**2) + C(region) + married',
             data=df_final).fit(cov_type='HC3')

# Output table
print("\nTable 3: Wage Determination Equation")
table = summary_col([m1, m2, m3, m4],
                   model_names=['(1)', '(2)', '(3)', '(4)'],
                   stars=True)
print(table)

Visualizing Main Results

python
# Plot interaction effects
edu_range = np.linspace(6, 20, 50)

# Male
male_pred = m4.predict(pd.DataFrame({
    'education': edu_range,
    'female': 0,
    'experience': 10,
    'region': 'East',
    'married': 1
}))

# Female
female_pred = m4.predict(pd.DataFrame({
    'education': edu_range,
    'female': 1,
    'experience': 10,
    'region': 'East',
    'married': 1
}))

plt.figure(figsize=(10, 6))
plt.plot(edu_range, male_pred, 'b-', linewidth=2, label='Male')
plt.plot(edu_range, female_pred, 'r-', linewidth=2, label='Female')
plt.xlabel('Years of Education')
plt.ylabel('Predicted log(wage)')
plt.title('Gender Differences in Education-Wage Relationship')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

# Calculate gender wage gap at different education levels
for edu in [10, 13, 16]:
    gap = (m4.params['female'] +
           m4.params['education:female'] * edu)
    gap_pct = (np.exp(gap) - 1) * 100
    print(f"Education = {edu} years: Gender wage gap = {gap_pct:.1f}%")

Section Summary

Key Points

TopicKey Point
Coefficient InterpretationLevel-Level, Log-Level, Level-Log, Log-Log
SignificanceStatistical significance ≠ Substantive importance
Causal InferenceOLS ≠ Causation, need identification strategy
Academic WritingClear, standardized, complete

Paper Writing Checklist

  • [ ] Clearly state research question
  • [ ] Describe data sources and variable definitions
  • [ ] Report descriptive statistics
  • [ ] Explain estimation method (OLS, IV, robust SE)
  • [ ] Present multiple model specifications
  • [ ] Interpret main coefficients (magnitude, significance, substantive meaning)
  • [ ] Conduct robustness checks
  • [ ] Discuss causal identification strategy
  • [ ] Visualize main results
  • [ ] Discuss limitations

Further Reading

Academic Writing Guides

  1. Angrist & Pischke (2010). "The Credibility Revolution in Empirical Economics"
  2. Abadie (2020). "Statistical Non-Significance in Empirical Economics"
  3. Imbens (2021). "Statistical Significance, p-Values, and the Reporting of Uncertainty"

Causal Inference Classics

  1. Angrist & Pischke (2009). Mostly Harmless Econometrics
  2. Pearl & Mackenzie (2018). The Book of Why
  3. Cunningham (2021). Causal Inference: The Mixtape

Congratulations! You have completed the entire Regression Analysis chapter!

Next Steps: Learn advanced econometric methods (panel data, instrumental variables, difference-in-differences, etc.)

Continue exploring other chapters in StatsPai!

Released under the MIT License. Content © Author.