Skip to content

循环(for/while)

让程序重复工作 —— 批量处理数据的关键


什么是循环?

循环让程序重复执行代码,避免重复编写。

日常类比

  • 手动方式:逐个检查 1000 份问卷(累死)
  • 循环方式:写一次检查逻辑,自动处理 1000 次(轻松)

对比 Stata/R

  • Stata: foreach, forvalues, while
  • R: for, while, apply 系列
  • Python: for, while

for 循环

1. 遍历列表

python
# 学生名单
students = ["Alice", "Bob", "Carol", "David"]

# 逐个打印
for student in students:
    print(f"学生: {student}")

# 输出:
# 学生: Alice
# 学生: Bob
# 学生: Carol
# 学生: David

语法

python
for 变量 in 序列:
    执行的代码

2. 遍历数字范围

python
# range(n): 生成 0 到 n-1
for i in range(5):
    print(i)

# 输出: 0 1 2 3 4

# range(start, end): 生成 start 到 end-1
for i in range(1, 6):
    print(i)

# 输出: 1 2 3 4 5

# range(start, end, step): 指定步长
for i in range(0, 10, 2):
    print(i)

# 输出: 0 2 4 6 8

3. 对比 Stata/R

Stata 示例

stata
* Stata
forvalues i = 1/5 {
    display `i'
}

foreach var in age income education {
    summarize `var'
}

R 示例

r
# R
for (i in 1:5) {
  print(i)
}

for (var in c("age", "income", "education")) {
  print(summary(data[[var]]))
}

Python 示例

python
# Python
for i in range(1, 6):
    print(i)

variables = ["age", "income", "education"]
for var in variables:
    print(f"变量 {var} 的统计信息")

实战案例:数据处理

案例 1:批量计算 BMI

python
# 受访者数据
heights = [170, 165, 180, 175, 160]  # cm
weights = [65, 55, 80, 70, 50]       # kg

# 批量计算 BMI
bmis = []
for i in range(len(heights)):
    height_m = heights[i] / 100
    bmi = weights[i] / (height_m ** 2)
    bmis.append(bmi)
    print(f"受访者 {i+1}: BMI = {bmi:.2f}")

# 输出:
# 受访者 1: BMI = 22.49
# 受访者 2: BMI = 20.20
# 受访者 3: BMI = 24.69
# ...

更 Pythonic 的写法(使用 zip)

python
heights = [170, 165, 180, 175, 160]
weights = [65, 55, 80, 70, 50]

for i, (h, w) in enumerate(zip(heights, weights), start=1):
    height_m = h / 100
    bmi = w / (height_m ** 2)
    print(f"受访者 {i}: BMI = {bmi:.2f}")

案例 2:数据质量检查

python
# 问卷数据(年龄列)
ages = [25, 30, -5, 150, 45, 28, 0, 35]

# 检查异常值
print("=== 年龄数据质量检查 ===")
valid_count = 0
invalid_count = 0

for i, age in enumerate(ages, start=1):
    if age < 0 or age > 120:
        print(f" 第 {i} 个数据异常: {age}")
        invalid_count += 1
    elif age == 0:
        print(f"️  第 {i} 个数据可疑: {age}")
    else:
        print(f" 第 {i} 个数据正常: {age}")
        valid_count += 1

print(f"\n总结: 正常 {valid_count} 个, 异常 {invalid_count} 个")

案例 3:收入分组统计

python
# 受访者收入
incomes = [45000, 75000, 120000, 55000, 95000, 30000, 150000]

# 分组统计
low_income = 0    # < 50000
mid_income = 0    # 50000-100000
high_income = 0   # > 100000

for income in incomes:
    if income < 50000:
        low_income += 1
    elif income <= 100000:
        mid_income += 1
    else:
        high_income += 1

print(f"低收入: {low_income} 人")
print(f"中收入: {mid_income} 人")
print(f"高收入: {high_income} 人")

# 输出:
# 低收入: 2 人
# 中收入: 3 人
# 高收入: 2 人

while 循环

基本语法

python
# while 循环:只要条件为真就继续
count = 0

while count < 5:
    print(f"计数: {count}")
    count += 1

# 输出: 0 1 2 3 4

实用场景:数据收集

python
# 模拟问卷数据收集(直到收集满 100 份)
responses = 0
target = 100

while responses < target:
    # 模拟收集数据(实际中是从数据库读取)
    responses += 1

    if responses % 10 == 0:  # 每 10 个显示一次进度
        progress = (responses / target) * 100
        print(f"进度: {progress:.0f}% ({responses}/{target})")

print(" 数据收集完成!")

️ 避免无限循环

python
#  无限循环(永远停不下来)
count = 0
while count < 5:
    print(count)
    # 忘记增加 count!程序会卡死

#  正确写法
count = 0
while count < 5:
    print(count)
    count += 1  # 确保条件最终会变为 False

循环控制:break 和 continue

1. break:提前退出循环

python
# 找到第一个不合格的数据就停止
ages = [25, 30, 45, -5, 28, 35]

for i, age in enumerate(ages, start=1):
    if age < 0 or age > 120:
        print(f" 发现异常数据(第 {i} 个): {age}")
        print("停止检查")
        break  # 立即退出循环
    else:
        print(f" 数据 {i} 正常")

# 输出:
#  数据 1 正常
#  数据 2 正常
#  数据 3 正常
#  发现异常数据(第 4 个): -5
# 停止检查

2. continue:跳过当前迭代

python
# 只处理偶数
for i in range(10):
    if i % 2 != 0:  # 如果是奇数
        continue    # 跳过后面的代码,进入下一次循环

    print(f"{i} 是偶数")

# 输出: 0 2 4 6 8

实战:跳过缺失数据

python
# 问卷数据(None 表示缺失)
responses = [5, 4, None, 3, None, 5, 4, 2]

# 计算平均分(跳过缺失值)
total = 0
count = 0

for response in responses:
    if response is None:
        continue  # 跳过缺失值

    total += response
    count += 1

if count > 0:
    average = total / count
    print(f"平均分: {average:.2f} (有效样本: {count})")

# 输出: 平均分: 3.83 (有效样本: 6)

高级循环技巧

1. 列表推导式(List Comprehension)

更简洁的循环写法:

python
# 传统 for 循环
squares = []
for i in range(10):
    squares.append(i ** 2)

# 列表推导式(一行搞定)
squares = [i ** 2 for i in range(10)]
print(squares)  # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

# 带条件的列表推导式
# 只保留偶数的平方
even_squares = [i ** 2 for i in range(10) if i % 2 == 0]
print(even_squares)  # [0, 4, 16, 36, 64]

社科应用

python
# 筛选高收入受访者
incomes = [45000, 75000, 120000, 55000, 95000, 30000, 150000]
high_incomes = [inc for inc in incomes if inc > 100000]
print(high_incomes)  # [120000, 150000]

# 收入对数转换
import math
log_incomes = [math.log(inc) for inc in incomes if inc > 0]

2. enumerate():同时获取索引和值

python
students = ["Alice", "Bob", "Carol"]

# 不推荐(手动管理索引)
for i in range(len(students)):
    print(f"{i+1}. {students[i]}")

# 推荐(使用 enumerate)
for i, student in enumerate(students, start=1):
    print(f"{i}. {student}")

# 输出:
# 1. Alice
# 2. Bob
# 3. Carol

3. zip():并行遍历多个列表

python
names = ["Alice", "Bob", "Carol"]
ages = [25, 30, 28]
majors = ["Economics", "Sociology", "Political Science"]

for name, age, major in zip(names, ages, majors):
    print(f"{name}, {age}岁, 专业: {major}")

# 输出:
# Alice, 25岁, 专业: Economics
# Bob, 30岁, 专业: Sociology
# Carol, 28岁, 专业: Political Science

4. 嵌套循环

python
# 生成所有可能的问卷组合
genders = ["Male", "Female"]
age_groups = ["18-30", "31-45", "46-60"]
education_levels = ["High School", "Bachelor's", "Master's"]

print("=== 所有可能的受访者类型 ===")
count = 0
for gender in genders:
    for age_group in age_groups:
        for education in education_levels:
            count += 1
            print(f"{count}. {gender}, {age_group}, {education}")

# 总共: 2 × 3 × 3 = 18 种组合

完整实战:问卷数据批量处理

python
# === 模拟问卷数据 ===
survey_data = [
    {"id": 1, "age": 25, "income": 50000, "satisfaction": 4},
    {"id": 2, "age": -5, "income": 75000, "satisfaction": 5},  # 年龄异常
    {"id": 3, "age": 30, "income": -10000, "satisfaction": 3}, # 收入异常
    {"id": 4, "age": 28, "income": 60000, "satisfaction": 6},  # 满意度异常
    {"id": 5, "age": 35, "income": 80000, "satisfaction": 4},
    {"id": 6, "age": 40, "income": 95000, "satisfaction": 5},
]

# === 数据质量检查 ===
print("=== 问卷数据质量检查 ===\n")

valid_responses = []
invalid_responses = []

for response in survey_data:
    resp_id = response["id"]
    age = response["age"]
    income = response["income"]
    satisfaction = response["satisfaction"]

    # 检查规则
    errors = []

    if age < 18 or age > 100:
        errors.append(f"年龄异常({age})")

    if income < 0:
        errors.append(f"收入异常({income})")

    if satisfaction < 1 or satisfaction > 5:
        errors.append(f"满意度异常({satisfaction})")

    # 分类
    if errors:
        print(f" 问卷 {resp_id}: {', '.join(errors)}")
        invalid_responses.append(response)
    else:
        print(f" 问卷 {resp_id}: 通过")
        valid_responses.append(response)

# === 统计摘要 ===
print(f"\n=== 汇总 ===")
print(f"总问卷数: {len(survey_data)}")
print(f"有效问卷: {len(valid_responses)}")
print(f"无效问卷: {len(invalid_responses)}")
print(f"有效率: {len(valid_responses)/len(survey_data)*100:.1f}%")

# === 描述性统计(仅有效数据) ===
if valid_responses:
    print(f"\n=== 有效数据描述统计 ===")

    # 年龄
    ages = [r["age"] for r in valid_responses]
    avg_age = sum(ages) / len(ages)
    print(f"平均年龄: {avg_age:.1f} 岁")

    # 收入
    incomes = [r["income"] for r in valid_responses]
    avg_income = sum(incomes) / len(incomes)
    print(f"平均收入: ${avg_income:,.0f}")

    # 满意度
    satisfactions = [r["satisfaction"] for r in valid_responses]
    avg_satisfaction = sum(satisfactions) / len(satisfactions)
    print(f"平均满意度: {avg_satisfaction:.2f} / 5.0")

输出

=== 问卷数据质量检查 ===

 问卷 1: 通过
 问卷 2: 年龄异常(-5)
 问卷 3: 收入异常(-10000)
 问卷 4: 满意度异常(6)
 问卷 5: 通过
 问卷 6: 通过

=== 汇总 ===
总问卷数: 6
有效问卷: 3
无效问卷: 3
有效率: 50.0%

=== 有效数据描述统计 ===
平均年龄: 29.3 岁
平均收入: $63,333
平均满意度: 4.33 / 5.0

常见错误

错误 1:修改正在遍历的列表

python
#  危险操作
numbers = [1, 2, 3, 4, 5]
for num in numbers:
    if num % 2 == 0:
        numbers.remove(num)  # 会导致跳过元素

#  正确做法:创建新列表
numbers = [1, 2, 3, 4, 5]
odd_numbers = [num for num in numbers if num % 2 != 0]

错误 2:range() 的结束值

python
#  误解
for i in range(5):
    print(i)
# 输出: 0 1 2 3 4(不包括 5!)

#  如果要包括 5
for i in range(1, 6):
    print(i)
# 输出: 1 2 3 4 5

错误 3:缩进错误

python
#  缩进问题
for i in range(3):
print(i)  # IndentationError

#  正确缩进
for i in range(3):
    print(i)

练习题

练习 1:成绩统计

python
scores = [85, 92, 78, 90, 65, 88, 95, 70]

# 任务:
# 1. 计算平均分
# 2. 统计及格人数(>= 60)
# 3. 找出最高分和最低分
# 4. 计算标准差(可选)

练习 2:数据清洗

python
raw_data = [
    {"name": "Alice", "age": 25, "income": 50000},
    {"name": "Bob", "age": -5, "income": 60000},      # 年龄异常
    {"name": "Carol", "age": 30, "income": None},     # 收入缺失
    {"name": "David", "age": 150, "income": 70000},   # 年龄异常
    {"name": "Emma", "age": 28, "income": 55000},
]

# 任务:
# 1. 筛选出所有有效数据(年龄18-100,收入非空)
# 2. 计算有效数据的平均年龄和收入
# 3. 打印清洗报告

练习 3:交叉表统计

python
# 生成性别 × 教育水平的交叉表
data = [
    {"gender": "Male", "education": "Bachelor's"},
    {"gender": "Female", "education": "Master's"},
    {"gender": "Male", "education": "Bachelor's"},
    {"gender": "Female", "education": "Bachelor's"},
    {"gender": "Male", "education": "Master's"},
]

# 任务: 统计各组合的人数
# 期望输出:
# Male, Bachelor's: 2
# Male, Master's: 1
# Female, Bachelor's: 1
# Female, Master's: 1

下一步

恭喜你完成了 基础语法 模块!你现在已经掌握了:

  • 变量和数据类型
  • 运算符
  • 条件语句
  • 循环

在下一个模块中,我们将学习 数据结构(列表、字典等),这是数据分析的核心工具。

准备好了吗?继续前进!

基于 MIT 许可证发布。内容版权归作者所有。