Skip to content

变量与数据类型

从 Stata/R 到 Python —— 理解变量的本质


什么是变量?

在所有编程语言中,变量 就是用来存储数据的"容器"。

对比理解

概念StataRPython
创建变量gen age = 25age <- 25age = 25age = 25
变量类型自动推断自动推断自动推断
查看变量display ageprint(age)print(age)

Python 变量的基本使用

1. 创建变量(赋值)

python
# 创建变量
age = 25
name = "Alice"
income = 50000.5
is_student = True

# 打印变量
print(age)        # 输出: 25
print(name)       # 输出: Alice
print(income)     # 输出: 50000.5
print(is_student) # 输出: True

Python 的规则

  • 变量名可以包含字母、数字、下划线
  • 变量名必须以字母或下划线开头
  • 变量名不能包含空格或特殊字符(除了下划线)
  • 变量名不能是 Python 保留字(如 if, for, class

好的变量名

python
student_age = 25
avg_income = 50000
is_employed = True
gdp_growth_rate = 0.03

不好的变量名

python
a = 25              # 太简短,不知道是什么
StudentAge = 25     # Python 习惯用小写+下划线
student age = 25    # 语法错误:有空格
2020_data = 100     # 语法错误:以数字开头

2. 变量的重新赋值

python
# 初始值
income = 50000
print(income)  # 50000

# 修改值
income = 60000
print(income)  # 60000

# 基于旧值修改
income = income + 5000
print(income)  # 65000

# 简写方式
income += 5000  # 等价于 income = income + 5000
print(income)  # 70000

对比 Stata

stata
* Stata
gen income = 50000
replace income = 60000
replace income = income + 5000

Python 的基本数据类型

1. 数值类型(Numeric)

(1) 整数(int)

python
age = 25
population = 1400000000
year = 2024

print(type(age))  # <class 'int'>

(2) 浮点数(float)

python
gpa = 3.85
income = 50000.5
interest_rate = 0.05

print(type(gpa))  # <class 'float'>

(3) 数值运算

python
# 基本运算
x = 10
y = 3

print(x + y)   # 13  (加法)
print(x - y)   # 7   (减法)
print(x * y)   # 30  (乘法)
print(x / y)   # 3.333... (除法,结果为浮点数)
print(x // y)  # 3   (整除)
print(x % y)   # 1   (取余)
print(x ** y)  # 1000 (幂运算)

对比 Stata/R

stata
* Stata
gen x = 10
gen y = 3
gen sum = x + y
gen division = x / y
gen power = x^y
r
# R
x <- 10
y <- 3
sum <- x + y
division <- x / y
power <- x^y

2. 字符串(String)

python
# 创建字符串(用单引号或双引号都可以)
name = "Alice"
major = 'Economics'
country = "China"

# 字符串拼接
full_name = "Alice" + " " + "Smith"
print(full_name)  # Alice Smith

# 字符串重复
laugh = "ha" * 3
print(laugh)  # hahaha

# 字符串长度
print(len("Hello"))  # 5

# 字符串方法
text = "hello world"
print(text.upper())       # HELLO WORLD
print(text.capitalize())  # Hello world
print(text.replace("world", "Python"))  # hello Python

实用示例(社科场景)

python
# 创建变量标签
variable_label = "respondent_age_in_years"
cleaned_label = variable_label.replace("_", " ").title()
print(cleaned_label)  # Respondent Age In Years

# 提取国家代码
country_code = "USA_2020_data"
code = country_code[:3]  # USA (切片操作)
year = country_code[4:8]  # 2020
print(f"国家: {code}, 年份: {year}")

3. 布尔值(Boolean)

python
# 布尔值只有两个: True 和 False
is_student = True
has_degree = False
is_employed = True

print(type(is_student))  # <class 'bool'>

# 布尔运算
print(True and False)  # False (与)
print(True or False)   # True  (或)
print(not True)        # False (非)

# 比较运算(返回布尔值)
age = 25
print(age > 18)        # True
print(age == 25)       # True (注意是两个等号)
print(age != 30)       # True (不等于)

社科场景示例

python
age = 25
income = 50000

# 判断是否符合调查条件
is_eligible = (age >= 18) and (age <= 65) and (income > 0)
print(f"符合调查条件: {is_eligible}")  # 符合调查条件: True

# 判断是否为高收入人群
is_high_income = income > 100000
print(f"高收入人群: {is_high_income}")  # 高收入人群: False

4. None(空值)

python
# None 表示"没有值"(类似 Stata 的 . 或 R 的 NA)
missing_data = None
print(missing_data)  # None
print(type(missing_data))  # <class 'NoneType'>

# 检查是否为 None
if missing_data is None:
    print("数据缺失")

类型转换

python
# 字符串转数字
age_str = "25"
age_int = int(age_str)
print(age_int + 5)  # 30

income_str = "50000.5"
income_float = float(income_str)
print(income_float)  # 50000.5

# 数字转字符串
age = 25
age_str = str(age)
print("Age: " + age_str)  # Age: 25

# 其他类型转布尔值
print(bool(0))      # False
print(bool(1))      # True
print(bool(""))     # False (空字符串)
print(bool("Hi"))   # True
print(bool(None))   # False

常见错误

python
# 错误示例
age = "25"
result = age + 5  #  TypeError: 只能字符串和字符串拼接

# 正确做法
age = int("25")
result = age + 5  #  30

实战示例:社科数据处理

示例 1:计算 BMI

python
# 受访者信息
name = "Alice"
height_cm = 170  # 厘米
weight_kg = 65   # 公斤

# 计算 BMI
height_m = height_cm / 100
bmi = weight_kg / (height_m ** 2)

print(f"{name} 的 BMI: {bmi:.2f}")  # Alice 的 BMI: 22.49

# 判断体重状态
if bmi < 18.5:
    status = "体重过轻"
elif bmi < 25:
    status = "正常"
elif bmi < 30:
    status = "超重"
else:
    status = "肥胖"

print(f"体重状态: {status}")  # 体重状态: 正常

示例 2:收入分组

python
# 受访者收入
respondent_id = 1001
annual_income = 75000  # 年收入(美元)

# 收入分组(四分位)
if annual_income < 30000:
    income_quartile = "Q1 (低收入)"
elif annual_income < 60000:
    income_quartile = "Q2 (中低收入)"
elif annual_income < 100000:
    income_quartile = "Q3 (中高收入)"
else:
    income_quartile = "Q4 (高收入)"

print(f"受访者 {respondent_id}: {income_quartile}")
# 输出: 受访者 1001: Q3 (中高收入)

示例 3:教育年限编码

python
# 教育水平(文本)
education_level = "Bachelor's Degree"

# 转换为教育年限(数字)
education_mapping = {
    "High School": 12,
    "Associate Degree": 14,
    "Bachelor's Degree": 16,
    "Master's Degree": 18,
    "Doctoral Degree": 22
}

years_of_education = education_mapping.get(education_level, 0)
print(f"教育年限: {years_of_education} 年")  # 教育年限: 16 年

Stata vs R vs Python 数据类型对比

Python类型Stata类型R类型示例
intnumeric (整数)integerage = 25
floatnumeric (小数)numericgpa = 3.85
strstring (str#)charactername = "Alice"
boolnumeric (0/1)logicalis_student = True
None. (缺失值)NAmissing = None

注意

  • Stata 没有真正的布尔类型(用 0/1 表示)
  • Python 的 None ≈ R 的 NA ≈ Stata 的 .

格式化输出(f-string)

Python 3.6+ 支持 f-string,非常方便:

python
name = "Alice"
age = 25
gpa = 3.856

# 基本用法
print(f"姓名: {name}, 年龄: {age}")
# 输出: 姓名: Alice, 年龄: 25

# 格式化数字
print(f"GPA: {gpa:.2f}")  # 保留2位小数
# 输出: GPA: 3.86

# 格式化大数字
income = 1234567.89
print(f"收入: ${income:,.2f}")
# 输出: 收入: $1,234,567.89

# 百分比
growth_rate = 0.0523
print(f"增长率: {growth_rate:.2%}")
# 输出: 增长率: 5.23%

实用模板

python
# 调查报告生成
respondent_id = 1001
age = 35
gender = "Male"
income = 85000
education = "Master's"

report = f"""
受访者报告
====================
ID: {respondent_id}
年龄: {age}
性别: {gender}
收入: ${income:,}
教育: {education}
====================
"""
print(report)

常见错误

错误 1:变量名拼写错误

python
age = 25
print(aeg)  #  NameError: name 'aeg' is not defined

错误 2:字符串和数字混用

python
result = "25" + 5  #  TypeError
result = int("25") + 5  #  30

错误 3:等号和双等号混淆

python
age = 25       #  赋值
if age = 25:   #  SyntaxError
if age == 25:  #  比较

练习题

练习 1:基本变量操作

创建以下变量并计算:

python
# 1. 创建变量
base_salary = 50000
bonus = 8000
tax_rate = 0.25

# 2. 计算税后收入
# 提示: 税后收入 = (基本工资 + 奖金) * (1 - 税率)

# 3. 格式化输出
# 输出格式: "税后收入: $X,XXX.XX"

练习 2:类型转换

python
# 给定字符串
age_str = "35"
income_str = "75000.50"

# 1. 转换为数字
# 2. 计算是否为高收入(> 100000)
# 3. 判断是否为中年(30-50岁)

练习 3:格式化输出

python
# 给定数据
country = "China"
gdp_per_capita = 12000.45
population = 1400000000
growth_rate = 0.062

# 输出格式:
# 国家: China
# 人均GDP: $12,000.45
# 人口: 1,400,000,000
# 增长率: 6.20%

下一步

在下一节中,我们将学习 运算符,包括算术、比较、逻辑运算符,为后续的条件判断和循环打下基础。

继续加油!

基于 MIT 许可证发布。内容版权归作者所有。