5.6

结果解释与报告（Interpretation & Reporting）

"If you torture the data long enough, it will confess to anything.""如果你折磨数据足够久，它会承认任何事情。"— Ronald Coase, 1991 Nobel Laureate in Economics (1991年诺贝尔经济学奖得主)

从回归输出到学术论文：专业化呈现你的研究

本节目标

完成本节后，你将能够：

正确解释不同模型形式的系数
区分统计显著性与实质显著性
输出论文级回归表格
撰写规范的回归结果报告
可视化回归结果
理解因果推断的局限性

系数解释的艺术

四种经典模型形式

模型形式	的解释	示例
Level-Level	增加 1 单位，增加单位	教育年限每增加 1 年，工资增加 2.5 千元
Log-Level	增加 1 单位，增长 %	教育年限每增加 1 年，工资增长 8%
Level-Log	增长 1%，增加单位	GDP 增长 1%，失业人口减少 0.03 百万
Log-Log	增长 1%，增长 %（弹性）	价格上涨 1%，需求下降 1.5%

Level-Level 模型

python

import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt

# 生成数据
np.random.seed(42)
n = 200
education = np.random.normal(13, 3, n)
wage = 10 + 2.5 * education + np.random.normal(0, 5, n)

df = pd.DataFrame({'wage': wage, 'education': education})

# Level-Level 回归
model_ll = smf.ols('wage ~ education', data=df).fit()
print("Level-Level 模型:")
print(model_ll.summary())

# 解释
beta_1 = model_ll.params['education']
print(f"\n解释：教育年限每增加 1 年，工资增加 {beta_1:.2f} 千元/月")

Log-Level 模型（最常用）

python

# Log-Level 回归
df['log_wage'] = np.log(df['wage'])
model_logl = smf.ols('log_wage ~ education', data=df).fit()
print("\nLog-Level 模型:")
print(model_logl.summary())

# 解释（近似）
beta_1_log = model_logl.params['education']
print(f"\n近似解释：教育年限每增加 1 年，工资增长约 {beta_1_log*100:.2f}%")

# 精确解释
print(f"精确解释：教育年限每增加 1 年，工资增长 {(np.exp(beta_1_log)-1)*100:.2f}%")

何时使用近似 vs 精确：

：近似和精确几乎相同
：使用精确解释

Level-Log 模型

python

# Level-Log 回归
df['log_education'] = np.log(df['education'])
model_llevl = smf.ols('wage ~ log_education', data=df).fit()
print("\nLevel-Log 模型:")
print(model_llevl.summary())

# 解释
beta_1_llevl = model_llevl.params['log_education']
print(f"\n解释：教育年限增长 1%，工资增加 {beta_1_llevl/100:.4f} 千元")
print(f"或：教育年限增长 10%，工资增加 {beta_1_llevl*0.1:.3f} 千元")

Log-Log 模型（弹性模型）

python

# Log-Log 回归
model_loglog = smf.ols('log_wage ~ log_education', data=df).fit()
print("\nLog-Log 模型:")
print(model_loglog.summary())

# 解释
elasticity = model_loglog.params['log_education']
print(f"\n解释：教育-工资弹性 = {elasticity:.3f}")
print(f"即：教育年限增长 1%，工资增长 {elasticity:.3f}%")

可视化模型对比

python

fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# 1. Level-Level
axes[0, 0].scatter(df['education'], df['wage'], alpha=0.5)
axes[0, 0].plot(df['education'], model_ll.fittedvalues, 'r-', linewidth=2)
axes[0, 0].set_xlabel('Education (years)')
axes[0, 0].set_ylabel('Wage (千元)')
axes[0, 0].set_title('Level-Level: Wage = β₀ + β₁·Education')
axes[0, 0].grid(True, alpha=0.3)

# 2. Log-Level
axes[0, 1].scatter(df['education'], df['log_wage'], alpha=0.5)
axes[0, 1].plot(df['education'], model_logl.fittedvalues, 'r-', linewidth=2)
axes[0, 1].set_xlabel('Education (years)')
axes[0, 1].set_ylabel('log(Wage)')
axes[0, 1].set_title('Log-Level: log(Wage) = β₀ + β₁·Education')
axes[0, 1].grid(True, alpha=0.3)

# 3. Level-Log
axes[1, 0].scatter(df['log_education'], df['wage'], alpha=0.5)
axes[1, 0].plot(df['log_education'], model_llevl.fittedvalues, 'r-', linewidth=2)
axes[1, 0].set_xlabel('log(Education)')
axes[1, 0].set_ylabel('Wage (千元)')
axes[1, 0].set_title('Level-Log: Wage = β₀ + β₁·log(Education)')
axes[1, 0].grid(True, alpha=0.3)

# 4. Log-Log
axes[1, 1].scatter(df['log_education'], df['log_wage'], alpha=0.5)
axes[1, 1].plot(df['log_education'], model_loglog.fittedvalues, 'r-', linewidth=2)
axes[1, 1].set_xlabel('log(Education)')
axes[1, 1].set_ylabel('log(Wage)')
axes[1, 1].set_title('Log-Log: log(Wage) = β₀ + β₁·log(Education)')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

论文级回归表格

使用 stargazer（Python 版）

python

from statsmodels.iolib.summary2 import summary_col

# 生成完整数据
np.random.seed(123)
n = 500
education = np.random.normal(13, 3, n)
experience = np.random.uniform(0, 30, n)
female = np.random.binomial(1, 0.5, n)
married = np.random.binomial(1, 0.6, n)

log_wage = (1.5 + 0.08*education + 0.03*experience - 0.0005*experience**2 
            - 0.15*female + 0.05*married + np.random.normal(0, 0.3, n))

df = pd.DataFrame({
    'log_wage': log_wage,
    'education': education,
    'experience': experience,
    'experience_sq': experience**2,
    'female': female,
    'married': married
})

# 估计多个模型
model1 = smf.ols('log_wage ~ education', data=df).fit(cov_type='HC3')
model2 = smf.ols('log_wage ~ education + experience + I(experience**2)', 
                 data=df).fit(cov_type='HC3')
model3 = smf.ols('log_wage ~ education + experience + I(experience**2) + female', 
                 data=df).fit(cov_type='HC3')
model4 = smf.ols('log_wage ~ education + experience + I(experience**2) + female + married', 
                 data=df).fit(cov_type='HC3')

# 创建对比表格
results_table = summary_col(
    [model1, model2, model3, model4],
    model_names=['(1)', '(2)', '(3)', '(4)'],
    stars=True,
    float_format='%.3f',
    info_dict={
        'N': lambda x: f"{int(x.nobs)}",
        'R²': lambda x: f"{x.rsquared:.3f}",
        'Adj. R²': lambda x: f"{x.rsquared_adj:.3f}"
    }
)

print("表 1：工资决定方程（因变量：log(工资)）")
print("="*80)
print(results_table)
print("="*80)
print("注：括号内为稳健标准误（HC3）")
print("*** p<0.01, ** p<0.05, * p<0.1")

输出 LaTeX 格式

python

# 输出为 LaTeX
latex_table = results_table.as_latex()
print("\nLaTeX 代码:")
print(latex_table)

# 保存到文件
with open('regression_table.tex', 'w') as f:
    f.write(latex_table)
print("\n已保存到 regression_table.tex")

自定义表格样式

python

# 更专业的表格样式
def create_regression_table(models, model_names, dependent_var, note=''):
    """
    创建专业的回归表格
    """
    results = summary_col(
        models,
        model_names=model_names,
        stars=True,
        float_format='%.4f'
    )
    
    # 添加表头和注释
    output = f"\n表：{dependent_var} 的回归分析\n"
    output += "="*90 + "\n"
    output += str(results)
    output += "\n" + "="*90 + "\n"
    output += "标准误在括号内。使用稳健标准误（HC3）。\n"
    output += "*** p<0.01, ** p<0.05, * p<0.1\n"
    if note:
        output += f"\n注：{note}\n"
    
    return output

table = create_regression_table(
    [model1, model2, model3, model4],
    ['模型1', '模型2', '模型3', '模型4'],
    'log(工资)',
    note='样本包含 500 名劳动者。模型 (2)-(4) 控制了工作经验及其平方项。'
)
print(table)

撰写回归结果报告

完整报告模板

python

def generate_report(model, df, dep_var, title="回归分析报告"):
    """
    生成完整的回归分析报告
    """
    report = f"\n{'='*80}\n"
    report += f"{title:^80}\n"
    report += f"{'='*80}\n\n"
    
    # 1. 模型规格
    report += "1. 模型规格\n"
    report += "-" * 80 + "\n"
    report += f"因变量：{dep_var}\n"
    report += f"样本量：{int(model.nobs)}\n"
    report += f"估计方法：OLS（稳健标准误）\n\n"
    
    # 2. 主要发现
    report += "2. 主要发现\n"
    report += "-" * 80 + "\n"
    for var in model.params.index:
        if var == 'Intercept' or var == 'const':
            continue
        coef = model.params[var]
        se = model.bse[var]
        t = model.tvalues[var]
        p = model.pvalues[var]
        
        # 显著性标记
        sig = '***' if p < 0.01 else ('**' if p < 0.05 else ('*' if p < 0.1 else ''))
        
        report += f"\n{var}:\n"
        report += f"  系数 = {coef:.4f}{sig} (SE = {se:.4f})\n"
        report += f"  t 统计量 = {t:.3f}, p 值 = {p:.4f}\n"
        
        # 解释（假设 log-level 模型）
        if 'log' in dep_var.lower():
            pct_change = (np.exp(coef) - 1) * 100
            report += f"  解释：{var} 每增加 1 单位，{dep_var} 增长 {pct_change:.2f}%\n"
    
    # 3. 模型拟合度
    report += "\n3. 模型拟合度\n"
    report += "-" * 80 + "\n"
    report += f"R² = {model.rsquared:.4f}\n"
    report += f"Adjusted R² = {model.rsquared_adj:.4f}\n"
    report += f"F 统计量 = {model.fvalue:.2f} (p = {model.f_pvalue:.4f})\n"
    
    # 4. 诊断检验
    report += "\n4. 诊断检验\n"
    report += "-" * 80 + "\n"
    
    # 异方差检验
    from statsmodels.stats.diagnostic import het_breuschpagan
    bp_test = het_breuschpagan(model.resid, model.model.exog)
    report += f"Breusch-Pagan 检验 (异方差): LM = {bp_test[0]:.3f}, p = {bp_test[1]:.4f}\n"
    
    # 正态性检验
    from statsmodels.stats.stattools import jarque_bera
    jb_test = jarque_bera(model.resid)
    report += f"Jarque-Bera 检验 (正态性): JB = {jb_test[0]:.3f}, p = {jb_test[1]:.4f}\n"
    
    # 自相关检验
    from statsmodels.stats.stattools import durbin_watson
    dw = durbin_watson(model.resid)
    report += f"Durbin-Watson 统计量 (自相关): {dw:.3f}\n"
    
    report += "\n" + "="*80 + "\n"
    
    return report

# 生成报告
report = generate_report(model4, df, 'log(工资)', title="工资决定方程分析")
print(report)

学术写作示例

markdown

## 实证结果

表 1 报告了工资决定方程的估计结果。列 (1) 展示了教育对工资的简单回归，
估计的教育回报率为 8.2%，且在 1% 水平上显著。这意味着，教育年限每增加
1 年，工资平均增长 8.2%。

列 (2) 加入了工作经验及其平方项。经验的系数为 0.030 (p < 0.01)，
经验平方项的系数为 -0.0005 (p < 0.01)，表明工资-经验曲线呈现倒 U 型。
峰值出现在经验约 30 年时。控制经验后，教育的回报率略微下降至 7.9%，
但仍高度显著。

列 (3) 进一步控制了性别。female 的系数为 -0.147 (p < 0.01)，
表明在控制教育和经验后，女性工资比男性低约 13.7% [= (exp(-0.147)-1)×100%]。
这一显著的性别工资差距可能反映了劳动力市场歧视或未观测到的生产率差异。

列 (4) 是完整模型，加入了婚姻状况。married 的系数为 0.052 (p < 0.05)，
说明已婚者的工资比未婚者高约 5.3%，这一"婚姻溢价"在劳动经济学文献中
得到广泛证实（Korenman & Neumark, 1991）。

所有模型均使用 HC3 稳健标准误，以校正可能存在的异方差性。调整 R² 从
模型 (1) 的 0.427 上升到模型 (4) 的 0.583，表明加入的变量显著提升了
模型的解释力。

可视化回归结果

系数图（Coefficient Plot）

python

# 提取系数和置信区间
coefs = model4.params.drop('Intercept')
ci = model4.conf_int(alpha=0.05).drop('Intercept')
ci_lower = ci[0]
ci_upper = ci[1]

# 绘制
fig, ax = plt.subplots(figsize=(10, 6))
y_pos = np.arange(len(coefs))

ax.errorbar(coefs, y_pos, xerr=[coefs - ci_lower, ci_upper - coefs],
            fmt='o', markersize=8, capsize=5, capthick=2, linewidth=2)
ax.axvline(x=0, color='red', linestyle='--', linewidth=1.5, alpha=0.7)
ax.set_yticks(y_pos)
ax.set_yticklabels(coefs.index)
ax.set_xlabel('系数估计值')
ax.set_title('回归系数与 95% 置信区间')
ax.grid(True, alpha=0.3, axis='x')
plt.tight_layout()
plt.show()

边际效应图

python

# 经验的边际效应（考虑平方项）
def marginal_effect_exp(exp_values, model):
    beta_exp = model.params['experience']
    beta_exp2 = model.params['I(experience ** 2)']
    return beta_exp + 2 * beta_exp2 * exp_values

exp_range = np.linspace(0, 40, 100)
me = marginal_effect_exp(exp_range, model4)

plt.figure(figsize=(10, 6))
plt.plot(exp_range, me, linewidth=2)
plt.axhline(y=0, color='r', linestyle='--', alpha=0.5)
plt.xlabel('工作经验（年）')
plt.ylabel('经验的边际效应（对 log(工资)）')
plt.title('工作经验对工资的边际效应')
plt.grid(True, alpha=0.3)

# 标记峰值
peak_exp = -model4.params['experience'] / (2 * model4.params['I(experience ** 2)'])
plt.axvline(x=peak_exp, color='green', linestyle=':', alpha=0.7, 
           label=f'峰值在经验 = {peak_exp:.1f} 年')
plt.legend()
plt.show()

预测工资分布

python

# 不同人群的预测工资
scenarios = pd.DataFrame({
    'education': [12, 16, 16, 18],
    'experience': [5, 10, 10, 15],
    'female': [0, 0, 1, 0],
    'married': [0, 1, 1, 1],
    'label': ['高中毕业男性', '本科男性，已婚', '本科女性，已婚', '研究生男性，已婚']
})

# 预测
scenarios['log_wage_pred'] = model4.predict(scenarios)
scenarios['wage_pred'] = np.exp(scenarios['log_wage_pred'])

# 计算预测区间
predictions = model4.get_prediction(scenarios)
pred_summary = predictions.summary_frame(alpha=0.05)
scenarios['ci_lower'] = np.exp(pred_summary['mean_ci_lower'])
scenarios['ci_upper'] = np.exp(pred_summary['mean_ci_upper'])

# 可视化
fig, ax = plt.subplots(figsize=(10, 6))
y_pos = np.arange(len(scenarios))

ax.barh(y_pos, scenarios['wage_pred'], alpha=0.7)
ax.errorbar(scenarios['wage_pred'], y_pos, 
           xerr=[scenarios['wage_pred'] - scenarios['ci_lower'],
                 scenarios['ci_upper'] - scenarios['wage_pred']],
           fmt='none', ecolor='black', capsize=5)

ax.set_yticks(y_pos)
ax.set_yticklabels(scenarios['label'])
ax.set_xlabel('预测工资（千元/月）')
ax.set_title('不同人群的预测工资与 95% 置信区间')
ax.grid(True, alpha=0.3, axis='x')
plt.tight_layout()
plt.show()

print("预测结果:")
print(scenarios[['label', 'wage_pred', 'ci_lower', 'ci_upper']])

️ 统计显著性 vs 实质显著性

问题：p 值的误用

常见误区：

"p < 0.001，因此影响非常大"
"p > 0.05，因此没有影响"

正确理解：

统计显著性：效应不为零的证据强度
实质显著性：效应的实际大小是否重要

案例分析

python

# 模拟大样本数据
np.random.seed(999)
n_large = 10000

education_large = np.random.normal(13, 3, n_large)
# 真实效应很小：0.005（0.5%）
log_wage_large = 2.5 + 0.005*education_large + np.random.normal(0, 0.3, n_large)

df_large = pd.DataFrame({'log_wage': log_wage_large, 'education': education_large})
model_large = smf.ols('log_wage ~ education', data=df_large).fit()

print("大样本回归:")
print(f"样本量: {n_large}")
print(f"教育系数: {model_large.params['education']:.6f}")
print(f"p 值: {model_large.pvalues['education']:.6f}")
print(f"95% 置信区间: [{model_large.conf_int().loc['education', 0]:.6f}, "
      f"{model_large.conf_int().loc['education', 1]:.6f}]")

# 实质意义
effect_pct = model_large.params['education'] * 100
print(f"\n实质解释: 教育每增加 1 年，工资增长 {effect_pct:.2f}%")
print("虽然统计显著，但实际效应极小（不到 1%），实质意义不大")

评估实质显著性

标准（因领域而异）：

Cohen's d（效应量）
R² 的增量
领域专家判断

python

# 计算 Cohen's d
def cohens_d(group1, group2):
    n1, n2 = len(group1), len(group2)
    var1, var2 = np.var(group1, ddof=1), np.var(group2, ddof=1)
    pooled_std = np.sqrt(((n1-1)*var1 + (n2-1)*var2) / (n1+n2-2))
    return (np.mean(group1) - np.mean(group2)) / pooled_std

# 案例：性别工资差距
male_wage = df[df['female'] == 0]['log_wage']
female_wage = df[df['female'] == 1]['log_wage']

d = cohens_d(male_wage, female_wage)
print(f"Cohen's d = {d:.3f}")

# 解释
if abs(d) < 0.2:
    print("效应量：小")
elif abs(d) < 0.5:
    print("效应量：中等")
else:
    print("效应量：大")

因果推断的局限性

OLS 回归 ≠ 因果效应

能够推断因果关系的条件：

随机实验（RCT）
自然实验（Natural Experiment）
工具变量（IV）
双重差分（DID）
回归断点（RDD）

OLS 回归的局限：

遗漏变量偏误
反向因果（Reverse Causality）
选择偏误（Selection Bias）

案例：教育对工资的因果效应

问题：

偏误来源：

遗漏变量：能力（ability）
- 高能力 → 更多教育
- 高能力 → 更高工资
- 向上偏
反向因果：预期工资 → 教育选择
测量误差：教育质量差异

因果推断的黄金标准：工具变量

python

# 模拟 IV 估计
np.random.seed(2024)
n = 1000

# 潜在能力（不可观测）
ability = np.random.normal(0, 1, n)

# 工具变量：出生季度（Angrist & Krueger, 1991）
# 假设晚出生的人因为强制教育法而多上学
birth_quarter = np.random.choice([1, 2, 3, 4], n)
instrument = (birth_quarter == 4).astype(int)

# 教育（内生）
education_iv = 12 + 1.5*ability + 0.5*instrument + np.random.normal(0, 2, n)

# 工资（真实因果效应 = 0.05）
log_wage_iv = 2.0 + 0.05*education_iv + 0.20*ability + np.random.normal(0, 0.3, n)

df_iv = pd.DataFrame({
    'log_wage': log_wage_iv,
    'education': education_iv,
    'instrument': instrument,
    'ability': ability  # 现实中不可观测
})

# OLS（有偏）
model_ols = smf.ols('log_wage ~ education', data=df_iv).fit()
print("OLS 估计（有偏）:")
print(f"教育系数 = {model_ols.params['education']:.4f}")

# IV 估计（无偏）
from linearmodels.iv import IV2SLS
iv_model = IV2SLS.from_formula('log_wage ~ 1 + [education ~ instrument]', 
                                data=df_iv).fit()
print("\nIV 估计（无偏）:")
print(f"教育系数 = {iv_model.params['education']:.4f}")

print(f"\n真实因果效应: 0.05")
print(f"OLS 向上偏误: {model_ols.params['education'] - 0.05:.4f}")

完整案例：发表级论文

研究问题

题目：教育回报率的性别差异：基于中国劳动力市场的证据

研究问题：

教育对工资的回报率有多高？
教育回报率是否存在性别差异？
这种差异如何随教育水平变化？

数据与方法

python

# 生成完整数据集
np.random.seed(20250128)
n = 2000

education = np.random.normal(13, 3, n)
experience = np.random.uniform(0, 30, n)
female = np.random.binomial(1, 0.5, n)
region = np.random.choice(['东部', '中部', '西部'], n, p=[0.4, 0.3, 0.3])
married = np.random.binomial(1, 0.6, n)

# DGP：教育回报率存在性别差异
log_wage = (1.5 + 
            0.08*education + 
            0.03*experience - 
            0.0005*experience**2 +
            0.10*female -
            0.015*education*female +  # 交互项
            {'东部': 0.15, '中部': 0.05, '西部': 0}[r] for r in region) +
            0.06*married +
            np.random.normal(0, 0.3, n))

# 这里需要修正语法
region_effects = [{'东部': 0.15, '中部': 0.05, '西部': 0}[r] for r in region]
log_wage = (1.5 + 0.08*education + 0.03*experience - 0.0005*experience**2 +
            0.10*female - 0.015*education*female + np.array(region_effects) + 
            0.06*married + np.random.normal(0, 0.3, n))

df_final = pd.DataFrame({
    'log_wage': log_wage,
    'education': education,
    'experience': experience,
    'female': female,
    'region': region,
    'married': married
})

# 描述统计
print("表 2：描述统计")
print("="*80)
desc_stats = df_final.describe().T[['mean', 'std', 'min', 'max']]
print(desc_stats)

# 按性别分组
print("\n按性别分组:")
print(df_final.groupby('female')[['education', 'experience', 'log_wage']].mean())

回归分析

python

# 模型 1-4
m1 = smf.ols('log_wage ~ education', data=df_final).fit(cov_type='HC3')
m2 = smf.ols('log_wage ~ education + experience + I(experience**2)', 
             data=df_final).fit(cov_type='HC3')
m3 = smf.ols('log_wage ~ education + experience + I(experience**2) + female', 
             data=df_final).fit(cov_type='HC3')
m4 = smf.ols('log_wage ~ education * female + experience + I(experience**2) + C(region) + married',
             data=df_final).fit(cov_type='HC3')

# 输出表格
print("\n表 3：工资决定方程")
table = summary_col([m1, m2, m3, m4], 
                   model_names=['(1)', '(2)', '(3)', '(4)'],
                   stars=True)
print(table)

可视化主要结果

python

# 绘制交互效应
edu_range = np.linspace(6, 20, 50)

# 男性
male_pred = m4.predict(pd.DataFrame({
    'education': edu_range,
    'female': 0,
    'experience': 10,
    'region': '东部',
    'married': 1
}))

# 女性
female_pred = m4.predict(pd.DataFrame({
    'education': edu_range,
    'female': 1,
    'experience': 10,
    'region': '东部',
    'married': 1
}))

plt.figure(figsize=(10, 6))
plt.plot(edu_range, male_pred, 'b-', linewidth=2, label='男性')
plt.plot(edu_range, female_pred, 'r-', linewidth=2, label='女性')
plt.xlabel('教育年限')
plt.ylabel('log(工资) 预测值')
plt.title('教育-工资关系的性别差异')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

# 计算不同教育水平下的性别差距
for edu in [10, 13, 16]:
    gap = (m4.params['female'] + 
           m4.params['education:female'] * edu)
    gap_pct = (np.exp(gap) - 1) * 100
    print(f"教育 = {edu} 年：性别工资差距 = {gap_pct:.1f}%")

本节小结

核心要点

主题	要点
系数解释	Level-Level, Log-Level, Level-Log, Log-Log
显著性	统计显著 ≠ 实质重要
因果推断	OLS ≠ 因果，需要识别策略
学术写作	清晰、规范、完整

论文写作检查清单

[ ] 明确研究问题
[ ] 描述数据来源和变量定义
[ ] 报告描述统计
[ ] 说明估计方法（OLS, IV, 稳健 SE）
[ ] 呈现多个模型规格
[ ] 解释主要系数（大小、显著性、实质意义）
[ ] 进行稳健性检验
[ ] 讨论因果识别策略
[ ] 可视化主要结果
[ ] 讨论局限性

5.6

本节目标

系数解释的艺术

四种经典模型形式

Level-Level 模型

Log-Level 模型（最常用）

Level-Log 模型

Log-Log 模型（弹性模型）

可视化模型对比

论文级回归表格

使用 stargazer（Python 版）

输出 LaTeX 格式

自定义表格样式

撰写回归结果报告

完整报告模板

学术写作示例

可视化回归结果

系数图（Coefficient Plot）

边际效应图

预测工资分布

️ 统计显著性 vs 实质显著性

问题：p 值的误用

案例分析

评估实质显著性

因果推断的局限性

OLS 回归 ≠ 因果效应

案例：教育对工资的因果效应

因果推断的黄金标准：工具变量

完整案例：发表级论文

研究问题

数据与方法

回归分析

可视化主要结果

本节小结

核心要点

论文写作检查清单

延伸阅读

学术写作指南

因果推断经典

5.6 ​

本节目标 ​

系数解释的艺术 ​

四种经典模型形式 ​

Level-Level 模型 ​

Log-Level 模型（最常用） ​

Level-Log 模型 ​

Log-Log 模型（弹性模型） ​

可视化模型对比 ​

论文级回归表格 ​

使用 stargazer（Python 版） ​

输出 LaTeX 格式 ​

自定义表格样式 ​

撰写回归结果报告 ​

完整报告模板 ​

学术写作示例 ​

可视化回归结果 ​

系数图（Coefficient Plot） ​

边际效应图 ​

预测工资分布 ​

️ 统计显著性 vs 实质显著性 ​

问题：p 值的误用 ​

案例分析 ​

评估实质显著性 ​

因果推断的局限性 ​

OLS 回归 ≠ 因果效应 ​

案例：教育对工资的因果效应 ​

因果推断的黄金标准：工具变量 ​

完整案例：发表级论文 ​

研究问题 ​

数据与方法 ​

回归分析 ​

可视化主要结果 ​

本节小结 ​

核心要点 ​

论文写作检查清单 ​

延伸阅读 ​

学术写作指南 ​

因果推断经典 ​

5.6

本节目标

系数解释的艺术

四种经典模型形式

Level-Level 模型

Log-Level 模型（最常用）

Level-Log 模型

Log-Log 模型（弹性模型）

可视化模型对比

论文级回归表格

使用 stargazer（Python 版）

输出 LaTeX 格式

自定义表格样式

撰写回归结果报告

完整报告模板

学术写作示例

可视化回归结果

系数图（Coefficient Plot）

边际效应图

预测工资分布

️ 统计显著性 vs 实质显著性

问题：p 值的误用

案例分析

评估实质显著性

因果推断的局限性

OLS 回归 ≠ 因果效应

案例：教育对工资的因果效应

因果推断的黄金标准：工具变量

完整案例：发表级论文

研究问题

数据与方法

回归分析

可视化主要结果

本节小结

核心要点

论文写作检查清单

延伸阅读

学术写作指南

因果推断经典