变量与数据类型

从 Stata/R 到 Python —— 理解变量的本质

什么是变量？

在所有编程语言中，变量就是用来存储数据的"容器"。

对比理解

概念	Stata	R	Python
创建变量	`gen age = 25`	`age <- 25` 或 `age = 25`	`age = 25`
变量类型	自动推断	自动推断	自动推断
查看变量	`display age`	`print(age)`	`print(age)`

Python 变量的基本使用

1. 创建变量（赋值）

python

# 创建变量
age = 25
name = "Alice"
income = 50000.5
is_student = True

# 打印变量
print(age)        # 输出: 25
print(name)       # 输出: Alice
print(income)     # 输出: 50000.5
print(is_student) # 输出: True

Python 的规则：

变量名可以包含字母、数字、下划线
变量名必须以字母或下划线开头
变量名不能包含空格或特殊字符（除了下划线）
变量名不能是 Python 保留字（如 if, for, class）

好的变量名：

python

student_age = 25
avg_income = 50000
is_employed = True
gdp_growth_rate = 0.03

不好的变量名：

python

a = 25              # 太简短，不知道是什么
StudentAge = 25     # Python 习惯用小写+下划线
student age = 25    # 语法错误：有空格
2020_data = 100     # 语法错误：以数字开头

2. 变量的重新赋值

python

# 初始值
income = 50000
print(income)  # 50000

# 修改值
income = 60000
print(income)  # 60000

# 基于旧值修改
income = income + 5000
print(income)  # 65000

# 简写方式
income += 5000  # 等价于 income = income + 5000
print(income)  # 70000

对比 Stata：

stata

* Stata
gen income = 50000
replace income = 60000
replace income = income + 5000

Python 的基本数据类型

1. 数值类型（Numeric）

(1) 整数（int）

python

age = 25
population = 1400000000
year = 2024

print(type(age))  # <class 'int'>

(2) 浮点数（float）

python

gpa = 3.85
income = 50000.5
interest_rate = 0.05

print(type(gpa))  # <class 'float'>

(3) 数值运算

python

# 基本运算
x = 10
y = 3

print(x + y)   # 13  (加法)
print(x - y)   # 7   (减法)
print(x * y)   # 30  (乘法)
print(x / y)   # 3.333... (除法,结果为浮点数)
print(x // y)  # 3   (整除)
print(x % y)   # 1   (取余)
print(x ** y)  # 1000 (幂运算)

对比 Stata/R：

stata

* Stata
gen x = 10
gen y = 3
gen sum = x + y
gen division = x / y
gen power = x^y

# R
x <- 10
y <- 3
sum <- x + y
division <- x / y
power <- x^y

2. 字符串（String）

python

# 创建字符串（用单引号或双引号都可以）
name = "Alice"
major = 'Economics'
country = "China"

# 字符串拼接
full_name = "Alice" + " " + "Smith"
print(full_name)  # Alice Smith

# 字符串重复
laugh = "ha" * 3
print(laugh)  # hahaha

# 字符串长度
print(len("Hello"))  # 5

# 字符串方法
text = "hello world"
print(text.upper())       # HELLO WORLD
print(text.capitalize())  # Hello world
print(text.replace("world", "Python"))  # hello Python

实用示例（社科场景）：

python

# 创建变量标签
variable_label = "respondent_age_in_years"
cleaned_label = variable_label.replace("_", " ").title()
print(cleaned_label)  # Respondent Age In Years

# 提取国家代码
country_code = "USA_2020_data"
code = country_code[:3]  # USA (切片操作)
year = country_code[4:8]  # 2020
print(f"国家: {code}, 年份: {year}")

3. 布尔值（Boolean）

python

# 布尔值只有两个: True 和 False
is_student = True
has_degree = False
is_employed = True

print(type(is_student))  # <class 'bool'>

# 布尔运算
print(True and False)  # False (与)
print(True or False)   # True  (或)
print(not True)        # False (非)

# 比较运算(返回布尔值)
age = 25
print(age > 18)        # True
print(age == 25)       # True (注意是两个等号)
print(age != 30)       # True (不等于)

社科场景示例：

python

age = 25
income = 50000

# 判断是否符合调查条件
is_eligible = (age >= 18) and (age <= 65) and (income > 0)
print(f"符合调查条件: {is_eligible}")  # 符合调查条件: True

# 判断是否为高收入人群
is_high_income = income > 100000
print(f"高收入人群: {is_high_income}")  # 高收入人群: False

4. None（空值）

python

# None 表示"没有值"（类似 Stata 的 . 或 R 的 NA）
missing_data = None
print(missing_data)  # None
print(type(missing_data))  # <class 'NoneType'>

# 检查是否为 None
if missing_data is None:
    print("数据缺失")

类型转换

python

# 字符串转数字
age_str = "25"
age_int = int(age_str)
print(age_int + 5)  # 30

income_str = "50000.5"
income_float = float(income_str)
print(income_float)  # 50000.5

# 数字转字符串
age = 25
age_str = str(age)
print("Age: " + age_str)  # Age: 25

# 其他类型转布尔值
print(bool(0))      # False
print(bool(1))      # True
print(bool(""))     # False (空字符串)
print(bool("Hi"))   # True
print(bool(None))   # False

常见错误：

python

# 错误示例
age = "25"
result = age + 5  #  TypeError: 只能字符串和字符串拼接

# 正确做法
age = int("25")
result = age + 5  #  30

实战示例：社科数据处理

示例 1：计算 BMI

python

# 受访者信息
name = "Alice"
height_cm = 170  # 厘米
weight_kg = 65   # 公斤

# 计算 BMI
height_m = height_cm / 100
bmi = weight_kg / (height_m ** 2)

print(f"{name} 的 BMI: {bmi:.2f}")  # Alice 的 BMI: 22.49

# 判断体重状态
if bmi < 18.5:
    status = "体重过轻"
elif bmi < 25:
    status = "正常"
elif bmi < 30:
    status = "超重"
else:
    status = "肥胖"

print(f"体重状态: {status}")  # 体重状态: 正常

示例 2：收入分组

python

# 受访者收入
respondent_id = 1001
annual_income = 75000  # 年收入（美元）

# 收入分组（四分位）
if annual_income < 30000:
    income_quartile = "Q1 (低收入)"
elif annual_income < 60000:
    income_quartile = "Q2 (中低收入)"
elif annual_income < 100000:
    income_quartile = "Q3 (中高收入)"
else:
    income_quartile = "Q4 (高收入)"

print(f"受访者 {respondent_id}: {income_quartile}")
# 输出: 受访者 1001: Q3 (中高收入)

示例 3：教育年限编码

python

# 教育水平（文本）
education_level = "Bachelor's Degree"

# 转换为教育年限（数字）
education_mapping = {
    "High School": 12,
    "Associate Degree": 14,
    "Bachelor's Degree": 16,
    "Master's Degree": 18,
    "Doctoral Degree": 22
}

years_of_education = education_mapping.get(education_level, 0)
print(f"教育年限: {years_of_education} 年")  # 教育年限: 16 年

Stata vs R vs Python 数据类型对比

Python类型	Stata类型	R类型	示例
`int`	numeric (整数)	integer	`age = 25`
`float`	numeric (小数)	numeric	`gpa = 3.85`
`str`	string (str#)	character	`name = "Alice"`
`bool`	numeric (0/1)	logical	`is_student = True`
`None`	`.` (缺失值)	`NA`	`missing = None`

注意：

Stata 没有真正的布尔类型（用 0/1 表示）
Python 的 None ≈ R 的 NA ≈ Stata 的 .

格式化输出（f-string）

Python 3.6+ 支持 f-string，非常方便：

python

name = "Alice"
age = 25
gpa = 3.856

# 基本用法
print(f"姓名: {name}, 年龄: {age}")
# 输出: 姓名: Alice, 年龄: 25

# 格式化数字
print(f"GPA: {gpa:.2f}")  # 保留2位小数
# 输出: GPA: 3.86

# 格式化大数字
income = 1234567.89
print(f"收入: ${income:,.2f}")
# 输出: 收入: $1,234,567.89

# 百分比
growth_rate = 0.0523
print(f"增长率: {growth_rate:.2%}")
# 输出: 增长率: 5.23%

实用模板：

python

# 调查报告生成
respondent_id = 1001
age = 35
gender = "Male"
income = 85000
education = "Master's"

report = f"""
受访者报告
====================
ID: {respondent_id}
年龄: {age} 岁
性别: {gender}
收入: ${income:,}
教育: {education}
====================
"""
print(report)

常见错误

错误 1：变量名拼写错误

python

age = 25
print(aeg)  #  NameError: name 'aeg' is not defined

错误 2：字符串和数字混用

python

result = "25" + 5  #  TypeError
result = int("25") + 5  #  30

错误 3：等号和双等号混淆

python

age = 25       #  赋值
if age = 25:   #  SyntaxError
if age == 25:  #  比较

练习题

练习 1：基本变量操作

创建以下变量并计算：

python

# 1. 创建变量
base_salary = 50000
bonus = 8000
tax_rate = 0.25

# 2. 计算税后收入
# 提示: 税后收入 = (基本工资 + 奖金) * (1 - 税率)

# 3. 格式化输出
# 输出格式: "税后收入: $X,XXX.XX"

练习 2：类型转换

python

# 给定字符串
age_str = "35"
income_str = "75000.50"

# 1. 转换为数字
# 2. 计算是否为高收入（> 100000）
# 3. 判断是否为中年（30-50岁）

练习 3：格式化输出

python

# 给定数据
country = "China"
gdp_per_capita = 12000.45
population = 1400000000
growth_rate = 0.062

# 输出格式：
# 国家: China
# 人均GDP: $12,000.45
# 人口: 1,400,000,000
# 增长率: 6.20%

下一步

在下一节中，我们将学习 运算符，包括算术、比较、逻辑运算符，为后续的条件判断和循环打下基础。

继续加油！

变量与数据类型 ​

什么是变量？ ​

对比理解 ​

Python 变量的基本使用 ​

1. 创建变量（赋值） ​

2. 变量的重新赋值 ​

Python 的基本数据类型 ​

1. 数值类型（Numeric） ​

(1) 整数（int） ​

(2) 浮点数（float） ​

(3) 数值运算 ​

2. 字符串（String） ​

3. 布尔值（Boolean） ​

4. None（空值） ​

类型转换 ​

实战示例：社科数据处理 ​

示例 1：计算 BMI ​

示例 2：收入分组 ​

示例 3：教育年限编码 ​

Stata vs R vs Python 数据类型对比 ​

格式化输出（f-string） ​

常见错误 ​

错误 1：变量名拼写错误 ​

错误 2：字符串和数字混用 ​

错误 3：等号和双等号混淆 ​

练习题 ​

练习 1：基本变量操作 ​

练习 2：类型转换 ​

练习 3：格式化输出 ​

下一步 ​

变量与数据类型

什么是变量？

对比理解

Python 变量的基本使用

1. 创建变量（赋值）

2. 变量的重新赋值

Python 的基本数据类型

1. 数值类型（Numeric）

(1) 整数（int）

(2) 浮点数（float）

(3) 数值运算

2. 字符串（String）

3. 布尔值（Boolean）

4. None（空值）

类型转换

实战示例：社科数据处理

示例 1：计算 BMI

示例 2：收入分组

示例 3：教育年限编码

Stata vs R vs Python 数据类型对比

格式化输出（f-string）

常见错误

错误 1：变量名拼写错误

错误 2：字符串和数字混用

错误 3：等号和双等号混淆

练习题

练习 1：基本变量操作

练习 2：类型转换

练习 3：格式化输出

下一步