变量与数据类型
从 Stata/R 到 Python —— 理解变量的本质
什么是变量?
在所有编程语言中,变量 就是用来存储数据的"容器"。
对比理解
| 概念 | Stata | R | Python |
|---|---|---|---|
| 创建变量 | gen age = 25 | age <- 25 或 age = 25 | age = 25 |
| 变量类型 | 自动推断 | 自动推断 | 自动推断 |
| 查看变量 | display age | print(age) | print(age) |
Python 变量的基本使用
1. 创建变量(赋值)
python
# 创建变量
age = 25
name = "Alice"
income = 50000.5
is_student = True
# 打印变量
print(age) # 输出: 25
print(name) # 输出: Alice
print(income) # 输出: 50000.5
print(is_student) # 输出: TruePython 的规则:
- 变量名可以包含字母、数字、下划线
- 变量名必须以字母或下划线开头
- 变量名不能包含空格或特殊字符(除了下划线)
- 变量名不能是 Python 保留字(如
if,for,class)
好的变量名:
python
student_age = 25
avg_income = 50000
is_employed = True
gdp_growth_rate = 0.03不好的变量名:
python
a = 25 # 太简短,不知道是什么
StudentAge = 25 # Python 习惯用小写+下划线
student age = 25 # 语法错误:有空格
2020_data = 100 # 语法错误:以数字开头2. 变量的重新赋值
python
# 初始值
income = 50000
print(income) # 50000
# 修改值
income = 60000
print(income) # 60000
# 基于旧值修改
income = income + 5000
print(income) # 65000
# 简写方式
income += 5000 # 等价于 income = income + 5000
print(income) # 70000对比 Stata:
stata
* Stata
gen income = 50000
replace income = 60000
replace income = income + 5000Python 的基本数据类型
1. 数值类型(Numeric)
(1) 整数(int)
python
age = 25
population = 1400000000
year = 2024
print(type(age)) # <class 'int'>(2) 浮点数(float)
python
gpa = 3.85
income = 50000.5
interest_rate = 0.05
print(type(gpa)) # <class 'float'>(3) 数值运算
python
# 基本运算
x = 10
y = 3
print(x + y) # 13 (加法)
print(x - y) # 7 (减法)
print(x * y) # 30 (乘法)
print(x / y) # 3.333... (除法,结果为浮点数)
print(x // y) # 3 (整除)
print(x % y) # 1 (取余)
print(x ** y) # 1000 (幂运算)对比 Stata/R:
stata
* Stata
gen x = 10
gen y = 3
gen sum = x + y
gen division = x / y
gen power = x^yr
# R
x <- 10
y <- 3
sum <- x + y
division <- x / y
power <- x^y2. 字符串(String)
python
# 创建字符串(用单引号或双引号都可以)
name = "Alice"
major = 'Economics'
country = "China"
# 字符串拼接
full_name = "Alice" + " " + "Smith"
print(full_name) # Alice Smith
# 字符串重复
laugh = "ha" * 3
print(laugh) # hahaha
# 字符串长度
print(len("Hello")) # 5
# 字符串方法
text = "hello world"
print(text.upper()) # HELLO WORLD
print(text.capitalize()) # Hello world
print(text.replace("world", "Python")) # hello Python实用示例(社科场景):
python
# 创建变量标签
variable_label = "respondent_age_in_years"
cleaned_label = variable_label.replace("_", " ").title()
print(cleaned_label) # Respondent Age In Years
# 提取国家代码
country_code = "USA_2020_data"
code = country_code[:3] # USA (切片操作)
year = country_code[4:8] # 2020
print(f"国家: {code}, 年份: {year}")3. 布尔值(Boolean)
python
# 布尔值只有两个: True 和 False
is_student = True
has_degree = False
is_employed = True
print(type(is_student)) # <class 'bool'>
# 布尔运算
print(True and False) # False (与)
print(True or False) # True (或)
print(not True) # False (非)
# 比较运算(返回布尔值)
age = 25
print(age > 18) # True
print(age == 25) # True (注意是两个等号)
print(age != 30) # True (不等于)社科场景示例:
python
age = 25
income = 50000
# 判断是否符合调查条件
is_eligible = (age >= 18) and (age <= 65) and (income > 0)
print(f"符合调查条件: {is_eligible}") # 符合调查条件: True
# 判断是否为高收入人群
is_high_income = income > 100000
print(f"高收入人群: {is_high_income}") # 高收入人群: False4. None(空值)
python
# None 表示"没有值"(类似 Stata 的 . 或 R 的 NA)
missing_data = None
print(missing_data) # None
print(type(missing_data)) # <class 'NoneType'>
# 检查是否为 None
if missing_data is None:
print("数据缺失")类型转换
python
# 字符串转数字
age_str = "25"
age_int = int(age_str)
print(age_int + 5) # 30
income_str = "50000.5"
income_float = float(income_str)
print(income_float) # 50000.5
# 数字转字符串
age = 25
age_str = str(age)
print("Age: " + age_str) # Age: 25
# 其他类型转布尔值
print(bool(0)) # False
print(bool(1)) # True
print(bool("")) # False (空字符串)
print(bool("Hi")) # True
print(bool(None)) # False常见错误:
python
# 错误示例
age = "25"
result = age + 5 # TypeError: 只能字符串和字符串拼接
# 正确做法
age = int("25")
result = age + 5 # 30实战示例:社科数据处理
示例 1:计算 BMI
python
# 受访者信息
name = "Alice"
height_cm = 170 # 厘米
weight_kg = 65 # 公斤
# 计算 BMI
height_m = height_cm / 100
bmi = weight_kg / (height_m ** 2)
print(f"{name} 的 BMI: {bmi:.2f}") # Alice 的 BMI: 22.49
# 判断体重状态
if bmi < 18.5:
status = "体重过轻"
elif bmi < 25:
status = "正常"
elif bmi < 30:
status = "超重"
else:
status = "肥胖"
print(f"体重状态: {status}") # 体重状态: 正常示例 2:收入分组
python
# 受访者收入
respondent_id = 1001
annual_income = 75000 # 年收入(美元)
# 收入分组(四分位)
if annual_income < 30000:
income_quartile = "Q1 (低收入)"
elif annual_income < 60000:
income_quartile = "Q2 (中低收入)"
elif annual_income < 100000:
income_quartile = "Q3 (中高收入)"
else:
income_quartile = "Q4 (高收入)"
print(f"受访者 {respondent_id}: {income_quartile}")
# 输出: 受访者 1001: Q3 (中高收入)示例 3:教育年限编码
python
# 教育水平(文本)
education_level = "Bachelor's Degree"
# 转换为教育年限(数字)
education_mapping = {
"High School": 12,
"Associate Degree": 14,
"Bachelor's Degree": 16,
"Master's Degree": 18,
"Doctoral Degree": 22
}
years_of_education = education_mapping.get(education_level, 0)
print(f"教育年限: {years_of_education} 年") # 教育年限: 16 年Stata vs R vs Python 数据类型对比
| Python类型 | Stata类型 | R类型 | 示例 |
|---|---|---|---|
int | numeric (整数) | integer | age = 25 |
float | numeric (小数) | numeric | gpa = 3.85 |
str | string (str#) | character | name = "Alice" |
bool | numeric (0/1) | logical | is_student = True |
None | . (缺失值) | NA | missing = None |
注意:
- Stata 没有真正的布尔类型(用 0/1 表示)
- Python 的
None≈ R 的NA≈ Stata 的.
格式化输出(f-string)
Python 3.6+ 支持 f-string,非常方便:
python
name = "Alice"
age = 25
gpa = 3.856
# 基本用法
print(f"姓名: {name}, 年龄: {age}")
# 输出: 姓名: Alice, 年龄: 25
# 格式化数字
print(f"GPA: {gpa:.2f}") # 保留2位小数
# 输出: GPA: 3.86
# 格式化大数字
income = 1234567.89
print(f"收入: ${income:,.2f}")
# 输出: 收入: $1,234,567.89
# 百分比
growth_rate = 0.0523
print(f"增长率: {growth_rate:.2%}")
# 输出: 增长率: 5.23%实用模板:
python
# 调查报告生成
respondent_id = 1001
age = 35
gender = "Male"
income = 85000
education = "Master's"
report = f"""
受访者报告
====================
ID: {respondent_id}
年龄: {age} 岁
性别: {gender}
收入: ${income:,}
教育: {education}
====================
"""
print(report)常见错误
错误 1:变量名拼写错误
python
age = 25
print(aeg) # NameError: name 'aeg' is not defined错误 2:字符串和数字混用
python
result = "25" + 5 # TypeError
result = int("25") + 5 # 30错误 3:等号和双等号混淆
python
age = 25 # 赋值
if age = 25: # SyntaxError
if age == 25: # 比较练习题
练习 1:基本变量操作
创建以下变量并计算:
python
# 1. 创建变量
base_salary = 50000
bonus = 8000
tax_rate = 0.25
# 2. 计算税后收入
# 提示: 税后收入 = (基本工资 + 奖金) * (1 - 税率)
# 3. 格式化输出
# 输出格式: "税后收入: $X,XXX.XX"练习 2:类型转换
python
# 给定字符串
age_str = "35"
income_str = "75000.50"
# 1. 转换为数字
# 2. 计算是否为高收入(> 100000)
# 3. 判断是否为中年(30-50岁)练习 3:格式化输出
python
# 给定数据
country = "China"
gdp_per_capita = 12000.45
population = 1400000000
growth_rate = 0.062
# 输出格式:
# 国家: China
# 人均GDP: $12,000.45
# 人口: 1,400,000,000
# 增长率: 6.20%下一步
在下一节中,我们将学习 运算符,包括算术、比较、逻辑运算符,为后续的条件判断和循环打下基础。
继续加油!