循环(for/while)
让程序重复工作 —— 批量处理数据的关键
什么是循环?
循环让程序重复执行代码,避免重复编写。
日常类比:
- 手动方式:逐个检查 1000 份问卷(累死)
- 循环方式:写一次检查逻辑,自动处理 1000 次(轻松)
对比 Stata/R:
- Stata:
foreach,forvalues,while - R:
for,while,apply系列 - Python:
for,while
for 循环
1. 遍历列表
python
# 学生名单
students = ["Alice", "Bob", "Carol", "David"]
# 逐个打印
for student in students:
print(f"学生: {student}")
# 输出:
# 学生: Alice
# 学生: Bob
# 学生: Carol
# 学生: David语法:
python
for 变量 in 序列:
执行的代码2. 遍历数字范围
python
# range(n): 生成 0 到 n-1
for i in range(5):
print(i)
# 输出: 0 1 2 3 4
# range(start, end): 生成 start 到 end-1
for i in range(1, 6):
print(i)
# 输出: 1 2 3 4 5
# range(start, end, step): 指定步长
for i in range(0, 10, 2):
print(i)
# 输出: 0 2 4 6 83. 对比 Stata/R
Stata 示例:
stata
* Stata
forvalues i = 1/5 {
display `i'
}
foreach var in age income education {
summarize `var'
}R 示例:
r
# R
for (i in 1:5) {
print(i)
}
for (var in c("age", "income", "education")) {
print(summary(data[[var]]))
}Python 示例:
python
# Python
for i in range(1, 6):
print(i)
variables = ["age", "income", "education"]
for var in variables:
print(f"变量 {var} 的统计信息")实战案例:数据处理
案例 1:批量计算 BMI
python
# 受访者数据
heights = [170, 165, 180, 175, 160] # cm
weights = [65, 55, 80, 70, 50] # kg
# 批量计算 BMI
bmis = []
for i in range(len(heights)):
height_m = heights[i] / 100
bmi = weights[i] / (height_m ** 2)
bmis.append(bmi)
print(f"受访者 {i+1}: BMI = {bmi:.2f}")
# 输出:
# 受访者 1: BMI = 22.49
# 受访者 2: BMI = 20.20
# 受访者 3: BMI = 24.69
# ...更 Pythonic 的写法(使用 zip):
python
heights = [170, 165, 180, 175, 160]
weights = [65, 55, 80, 70, 50]
for i, (h, w) in enumerate(zip(heights, weights), start=1):
height_m = h / 100
bmi = w / (height_m ** 2)
print(f"受访者 {i}: BMI = {bmi:.2f}")案例 2:数据质量检查
python
# 问卷数据(年龄列)
ages = [25, 30, -5, 150, 45, 28, 0, 35]
# 检查异常值
print("=== 年龄数据质量检查 ===")
valid_count = 0
invalid_count = 0
for i, age in enumerate(ages, start=1):
if age < 0 or age > 120:
print(f" 第 {i} 个数据异常: {age}")
invalid_count += 1
elif age == 0:
print(f"️ 第 {i} 个数据可疑: {age}")
else:
print(f" 第 {i} 个数据正常: {age}")
valid_count += 1
print(f"\n总结: 正常 {valid_count} 个, 异常 {invalid_count} 个")案例 3:收入分组统计
python
# 受访者收入
incomes = [45000, 75000, 120000, 55000, 95000, 30000, 150000]
# 分组统计
low_income = 0 # < 50000
mid_income = 0 # 50000-100000
high_income = 0 # > 100000
for income in incomes:
if income < 50000:
low_income += 1
elif income <= 100000:
mid_income += 1
else:
high_income += 1
print(f"低收入: {low_income} 人")
print(f"中收入: {mid_income} 人")
print(f"高收入: {high_income} 人")
# 输出:
# 低收入: 2 人
# 中收入: 3 人
# 高收入: 2 人while 循环
基本语法
python
# while 循环:只要条件为真就继续
count = 0
while count < 5:
print(f"计数: {count}")
count += 1
# 输出: 0 1 2 3 4实用场景:数据收集
python
# 模拟问卷数据收集(直到收集满 100 份)
responses = 0
target = 100
while responses < target:
# 模拟收集数据(实际中是从数据库读取)
responses += 1
if responses % 10 == 0: # 每 10 个显示一次进度
progress = (responses / target) * 100
print(f"进度: {progress:.0f}% ({responses}/{target})")
print(" 数据收集完成!")️ 避免无限循环
python
# 无限循环(永远停不下来)
count = 0
while count < 5:
print(count)
# 忘记增加 count!程序会卡死
# 正确写法
count = 0
while count < 5:
print(count)
count += 1 # 确保条件最终会变为 False循环控制:break 和 continue
1. break:提前退出循环
python
# 找到第一个不合格的数据就停止
ages = [25, 30, 45, -5, 28, 35]
for i, age in enumerate(ages, start=1):
if age < 0 or age > 120:
print(f" 发现异常数据(第 {i} 个): {age}")
print("停止检查")
break # 立即退出循环
else:
print(f" 数据 {i} 正常")
# 输出:
# 数据 1 正常
# 数据 2 正常
# 数据 3 正常
# 发现异常数据(第 4 个): -5
# 停止检查2. continue:跳过当前迭代
python
# 只处理偶数
for i in range(10):
if i % 2 != 0: # 如果是奇数
continue # 跳过后面的代码,进入下一次循环
print(f"{i} 是偶数")
# 输出: 0 2 4 6 8实战:跳过缺失数据
python
# 问卷数据(None 表示缺失)
responses = [5, 4, None, 3, None, 5, 4, 2]
# 计算平均分(跳过缺失值)
total = 0
count = 0
for response in responses:
if response is None:
continue # 跳过缺失值
total += response
count += 1
if count > 0:
average = total / count
print(f"平均分: {average:.2f} (有效样本: {count})")
# 输出: 平均分: 3.83 (有效样本: 6)高级循环技巧
1. 列表推导式(List Comprehension)
更简洁的循环写法:
python
# 传统 for 循环
squares = []
for i in range(10):
squares.append(i ** 2)
# 列表推导式(一行搞定)
squares = [i ** 2 for i in range(10)]
print(squares) # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
# 带条件的列表推导式
# 只保留偶数的平方
even_squares = [i ** 2 for i in range(10) if i % 2 == 0]
print(even_squares) # [0, 4, 16, 36, 64]社科应用:
python
# 筛选高收入受访者
incomes = [45000, 75000, 120000, 55000, 95000, 30000, 150000]
high_incomes = [inc for inc in incomes if inc > 100000]
print(high_incomes) # [120000, 150000]
# 收入对数转换
import math
log_incomes = [math.log(inc) for inc in incomes if inc > 0]2. enumerate():同时获取索引和值
python
students = ["Alice", "Bob", "Carol"]
# 不推荐(手动管理索引)
for i in range(len(students)):
print(f"{i+1}. {students[i]}")
# 推荐(使用 enumerate)
for i, student in enumerate(students, start=1):
print(f"{i}. {student}")
# 输出:
# 1. Alice
# 2. Bob
# 3. Carol3. zip():并行遍历多个列表
python
names = ["Alice", "Bob", "Carol"]
ages = [25, 30, 28]
majors = ["Economics", "Sociology", "Political Science"]
for name, age, major in zip(names, ages, majors):
print(f"{name}, {age}岁, 专业: {major}")
# 输出:
# Alice, 25岁, 专业: Economics
# Bob, 30岁, 专业: Sociology
# Carol, 28岁, 专业: Political Science4. 嵌套循环
python
# 生成所有可能的问卷组合
genders = ["Male", "Female"]
age_groups = ["18-30", "31-45", "46-60"]
education_levels = ["High School", "Bachelor's", "Master's"]
print("=== 所有可能的受访者类型 ===")
count = 0
for gender in genders:
for age_group in age_groups:
for education in education_levels:
count += 1
print(f"{count}. {gender}, {age_group}, {education}")
# 总共: 2 × 3 × 3 = 18 种组合完整实战:问卷数据批量处理
python
# === 模拟问卷数据 ===
survey_data = [
{"id": 1, "age": 25, "income": 50000, "satisfaction": 4},
{"id": 2, "age": -5, "income": 75000, "satisfaction": 5}, # 年龄异常
{"id": 3, "age": 30, "income": -10000, "satisfaction": 3}, # 收入异常
{"id": 4, "age": 28, "income": 60000, "satisfaction": 6}, # 满意度异常
{"id": 5, "age": 35, "income": 80000, "satisfaction": 4},
{"id": 6, "age": 40, "income": 95000, "satisfaction": 5},
]
# === 数据质量检查 ===
print("=== 问卷数据质量检查 ===\n")
valid_responses = []
invalid_responses = []
for response in survey_data:
resp_id = response["id"]
age = response["age"]
income = response["income"]
satisfaction = response["satisfaction"]
# 检查规则
errors = []
if age < 18 or age > 100:
errors.append(f"年龄异常({age})")
if income < 0:
errors.append(f"收入异常({income})")
if satisfaction < 1 or satisfaction > 5:
errors.append(f"满意度异常({satisfaction})")
# 分类
if errors:
print(f" 问卷 {resp_id}: {', '.join(errors)}")
invalid_responses.append(response)
else:
print(f" 问卷 {resp_id}: 通过")
valid_responses.append(response)
# === 统计摘要 ===
print(f"\n=== 汇总 ===")
print(f"总问卷数: {len(survey_data)}")
print(f"有效问卷: {len(valid_responses)}")
print(f"无效问卷: {len(invalid_responses)}")
print(f"有效率: {len(valid_responses)/len(survey_data)*100:.1f}%")
# === 描述性统计(仅有效数据) ===
if valid_responses:
print(f"\n=== 有效数据描述统计 ===")
# 年龄
ages = [r["age"] for r in valid_responses]
avg_age = sum(ages) / len(ages)
print(f"平均年龄: {avg_age:.1f} 岁")
# 收入
incomes = [r["income"] for r in valid_responses]
avg_income = sum(incomes) / len(incomes)
print(f"平均收入: ${avg_income:,.0f}")
# 满意度
satisfactions = [r["satisfaction"] for r in valid_responses]
avg_satisfaction = sum(satisfactions) / len(satisfactions)
print(f"平均满意度: {avg_satisfaction:.2f} / 5.0")输出:
=== 问卷数据质量检查 ===
问卷 1: 通过
问卷 2: 年龄异常(-5)
问卷 3: 收入异常(-10000)
问卷 4: 满意度异常(6)
问卷 5: 通过
问卷 6: 通过
=== 汇总 ===
总问卷数: 6
有效问卷: 3
无效问卷: 3
有效率: 50.0%
=== 有效数据描述统计 ===
平均年龄: 29.3 岁
平均收入: $63,333
平均满意度: 4.33 / 5.0常见错误
错误 1:修改正在遍历的列表
python
# 危险操作
numbers = [1, 2, 3, 4, 5]
for num in numbers:
if num % 2 == 0:
numbers.remove(num) # 会导致跳过元素
# 正确做法:创建新列表
numbers = [1, 2, 3, 4, 5]
odd_numbers = [num for num in numbers if num % 2 != 0]错误 2:range() 的结束值
python
# 误解
for i in range(5):
print(i)
# 输出: 0 1 2 3 4(不包括 5!)
# 如果要包括 5
for i in range(1, 6):
print(i)
# 输出: 1 2 3 4 5错误 3:缩进错误
python
# 缩进问题
for i in range(3):
print(i) # IndentationError
# 正确缩进
for i in range(3):
print(i)练习题
练习 1:成绩统计
python
scores = [85, 92, 78, 90, 65, 88, 95, 70]
# 任务:
# 1. 计算平均分
# 2. 统计及格人数(>= 60)
# 3. 找出最高分和最低分
# 4. 计算标准差(可选)练习 2:数据清洗
python
raw_data = [
{"name": "Alice", "age": 25, "income": 50000},
{"name": "Bob", "age": -5, "income": 60000}, # 年龄异常
{"name": "Carol", "age": 30, "income": None}, # 收入缺失
{"name": "David", "age": 150, "income": 70000}, # 年龄异常
{"name": "Emma", "age": 28, "income": 55000},
]
# 任务:
# 1. 筛选出所有有效数据(年龄18-100,收入非空)
# 2. 计算有效数据的平均年龄和收入
# 3. 打印清洗报告练习 3:交叉表统计
python
# 生成性别 × 教育水平的交叉表
data = [
{"gender": "Male", "education": "Bachelor's"},
{"gender": "Female", "education": "Master's"},
{"gender": "Male", "education": "Bachelor's"},
{"gender": "Female", "education": "Bachelor's"},
{"gender": "Male", "education": "Master's"},
]
# 任务: 统计各组合的人数
# 期望输出:
# Male, Bachelor's: 2
# Male, Master's: 1
# Female, Bachelor's: 1
# Female, Master's: 1下一步
恭喜你完成了 基础语法 模块!你现在已经掌握了:
- 变量和数据类型
- 运算符
- 条件语句
- 循环
在下一个模块中,我们将学习 数据结构(列表、字典等),这是数据分析的核心工具。
准备好了吗?继续前进!