循环（for/while）

让程序重复工作 —— 批量处理数据的关键

什么是循环？

循环让程序重复执行代码,避免重复编写。

日常类比：

手动方式：逐个检查 1000 份问卷（累死）
循环方式：写一次检查逻辑，自动处理 1000 次（轻松）

对比 Stata/R：

Stata: foreach, forvalues, while
R: for, while, apply 系列
Python: for, while

for 循环

1. 遍历列表

python

# 学生名单
students = ["Alice", "Bob", "Carol", "David"]

# 逐个打印
for student in students:
    print(f"学生: {student}")

# 输出:
# 学生: Alice
# 学生: Bob
# 学生: Carol
# 学生: David

语法：

python

for 变量 in 序列:
    执行的代码

2. 遍历数字范围

python

# range(n): 生成 0 到 n-1
for i in range(5):
    print(i)

# 输出: 0 1 2 3 4

# range(start, end): 生成 start 到 end-1
for i in range(1, 6):
    print(i)

# 输出: 1 2 3 4 5

# range(start, end, step): 指定步长
for i in range(0, 10, 2):
    print(i)

# 输出: 0 2 4 6 8

3. 对比 Stata/R

Stata 示例：

stata

* Stata
forvalues i = 1/5 {
    display `i'
}

foreach var in age income education {
    summarize `var'
}

R 示例：

# R
for (i in 1:5) {
  print(i)
}

for (var in c("age", "income", "education")) {
  print(summary(data[[var]]))
}

Python 示例：

python

# Python
for i in range(1, 6):
    print(i)

variables = ["age", "income", "education"]
for var in variables:
    print(f"变量 {var} 的统计信息")

实战案例：数据处理

案例 1：批量计算 BMI

python

# 受访者数据
heights = [170, 165, 180, 175, 160]  # cm
weights = [65, 55, 80, 70, 50]       # kg

# 批量计算 BMI
bmis = []
for i in range(len(heights)):
    height_m = heights[i] / 100
    bmi = weights[i] / (height_m ** 2)
    bmis.append(bmi)
    print(f"受访者 {i+1}: BMI = {bmi:.2f}")

# 输出:
# 受访者 1: BMI = 22.49
# 受访者 2: BMI = 20.20
# 受访者 3: BMI = 24.69
# ...

更 Pythonic 的写法（使用 zip）：

python

heights = [170, 165, 180, 175, 160]
weights = [65, 55, 80, 70, 50]

for i, (h, w) in enumerate(zip(heights, weights), start=1):
    height_m = h / 100
    bmi = w / (height_m ** 2)
    print(f"受访者 {i}: BMI = {bmi:.2f}")

案例 2：数据质量检查

python

# 问卷数据（年龄列）
ages = [25, 30, -5, 150, 45, 28, 0, 35]

# 检查异常值
print("=== 年龄数据质量检查 ===")
valid_count = 0
invalid_count = 0

for i, age in enumerate(ages, start=1):
    if age < 0 or age > 120:
        print(f" 第 {i} 个数据异常: {age}")
        invalid_count += 1
    elif age == 0:
        print(f"️  第 {i} 个数据可疑: {age}")
    else:
        print(f" 第 {i} 个数据正常: {age}")
        valid_count += 1

print(f"\n总结: 正常 {valid_count} 个, 异常 {invalid_count} 个")

案例 3：收入分组统计

python

# 受访者收入
incomes = [45000, 75000, 120000, 55000, 95000, 30000, 150000]

# 分组统计
low_income = 0    # < 50000
mid_income = 0    # 50000-100000
high_income = 0   # > 100000

for income in incomes:
    if income < 50000:
        low_income += 1
    elif income <= 100000:
        mid_income += 1
    else:
        high_income += 1

print(f"低收入: {low_income} 人")
print(f"中收入: {mid_income} 人")
print(f"高收入: {high_income} 人")

# 输出:
# 低收入: 2 人
# 中收入: 3 人
# 高收入: 2 人

while 循环

基本语法

python

# while 循环：只要条件为真就继续
count = 0

while count < 5:
    print(f"计数: {count}")
    count += 1

# 输出: 0 1 2 3 4

实用场景：数据收集

python

# 模拟问卷数据收集（直到收集满 100 份）
responses = 0
target = 100

while responses < target:
    # 模拟收集数据（实际中是从数据库读取）
    responses += 1

    if responses % 10 == 0:  # 每 10 个显示一次进度
        progress = (responses / target) * 100
        print(f"进度: {progress:.0f}% ({responses}/{target})")

print(" 数据收集完成！")

️ 避免无限循环

python

#  无限循环（永远停不下来）
count = 0
while count < 5:
    print(count)
    # 忘记增加 count！程序会卡死

#  正确写法
count = 0
while count < 5:
    print(count)
    count += 1  # 确保条件最终会变为 False

循环控制：break 和 continue

1. break：提前退出循环

python

# 找到第一个不合格的数据就停止
ages = [25, 30, 45, -5, 28, 35]

for i, age in enumerate(ages, start=1):
    if age < 0 or age > 120:
        print(f" 发现异常数据（第 {i} 个）: {age}")
        print("停止检查")
        break  # 立即退出循环
    else:
        print(f" 数据 {i} 正常")

# 输出:
#  数据 1 正常
#  数据 2 正常
#  数据 3 正常
#  发现异常数据（第 4 个）: -5
# 停止检查

2. continue：跳过当前迭代

python

# 只处理偶数
for i in range(10):
    if i % 2 != 0:  # 如果是奇数
        continue    # 跳过后面的代码，进入下一次循环

    print(f"{i} 是偶数")

# 输出: 0 2 4 6 8

实战：跳过缺失数据

python

# 问卷数据（None 表示缺失）
responses = [5, 4, None, 3, None, 5, 4, 2]

# 计算平均分（跳过缺失值）
total = 0
count = 0

for response in responses:
    if response is None:
        continue  # 跳过缺失值

    total += response
    count += 1

if count > 0:
    average = total / count
    print(f"平均分: {average:.2f} (有效样本: {count})")

# 输出: 平均分: 3.83 (有效样本: 6)

高级循环技巧

1. 列表推导式（List Comprehension）

更简洁的循环写法：

python

# 传统 for 循环
squares = []
for i in range(10):
    squares.append(i ** 2)

# 列表推导式（一行搞定）
squares = [i ** 2 for i in range(10)]
print(squares)  # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

# 带条件的列表推导式
# 只保留偶数的平方
even_squares = [i ** 2 for i in range(10) if i % 2 == 0]
print(even_squares)  # [0, 4, 16, 36, 64]

社科应用：

python

# 筛选高收入受访者
incomes = [45000, 75000, 120000, 55000, 95000, 30000, 150000]
high_incomes = [inc for inc in incomes if inc > 100000]
print(high_incomes)  # [120000, 150000]

# 收入对数转换
import math
log_incomes = [math.log(inc) for inc in incomes if inc > 0]

2. enumerate()：同时获取索引和值

python

students = ["Alice", "Bob", "Carol"]

# 不推荐（手动管理索引）
for i in range(len(students)):
    print(f"{i+1}. {students[i]}")

# 推荐（使用 enumerate）
for i, student in enumerate(students, start=1):
    print(f"{i}. {student}")

# 输出:
# 1. Alice
# 2. Bob
# 3. Carol

3. zip()：并行遍历多个列表

python

names = ["Alice", "Bob", "Carol"]
ages = [25, 30, 28]
majors = ["Economics", "Sociology", "Political Science"]

for name, age, major in zip(names, ages, majors):
    print(f"{name}, {age}岁, 专业: {major}")

# 输出:
# Alice, 25岁, 专业: Economics
# Bob, 30岁, 专业: Sociology
# Carol, 28岁, 专业: Political Science

4. 嵌套循环

python

# 生成所有可能的问卷组合
genders = ["Male", "Female"]
age_groups = ["18-30", "31-45", "46-60"]
education_levels = ["High School", "Bachelor's", "Master's"]

print("=== 所有可能的受访者类型 ===")
count = 0
for gender in genders:
    for age_group in age_groups:
        for education in education_levels:
            count += 1
            print(f"{count}. {gender}, {age_group}, {education}")

# 总共: 2 × 3 × 3 = 18 种组合

完整实战：问卷数据批量处理

python

# === 模拟问卷数据 ===
survey_data = [
    {"id": 1, "age": 25, "income": 50000, "satisfaction": 4},
    {"id": 2, "age": -5, "income": 75000, "satisfaction": 5},  # 年龄异常
    {"id": 3, "age": 30, "income": -10000, "satisfaction": 3}, # 收入异常
    {"id": 4, "age": 28, "income": 60000, "satisfaction": 6},  # 满意度异常
    {"id": 5, "age": 35, "income": 80000, "satisfaction": 4},
    {"id": 6, "age": 40, "income": 95000, "satisfaction": 5},
]

# === 数据质量检查 ===
print("=== 问卷数据质量检查 ===\n")

valid_responses = []
invalid_responses = []

for response in survey_data:
    resp_id = response["id"]
    age = response["age"]
    income = response["income"]
    satisfaction = response["satisfaction"]

    # 检查规则
    errors = []

    if age < 18 or age > 100:
        errors.append(f"年龄异常({age})")

    if income < 0:
        errors.append(f"收入异常({income})")

    if satisfaction < 1 or satisfaction > 5:
        errors.append(f"满意度异常({satisfaction})")

    # 分类
    if errors:
        print(f" 问卷 {resp_id}: {', '.join(errors)}")
        invalid_responses.append(response)
    else:
        print(f" 问卷 {resp_id}: 通过")
        valid_responses.append(response)

# === 统计摘要 ===
print(f"\n=== 汇总 ===")
print(f"总问卷数: {len(survey_data)}")
print(f"有效问卷: {len(valid_responses)}")
print(f"无效问卷: {len(invalid_responses)}")
print(f"有效率: {len(valid_responses)/len(survey_data)*100:.1f}%")

# === 描述性统计（仅有效数据） ===
if valid_responses:
    print(f"\n=== 有效数据描述统计 ===")

    # 年龄
    ages = [r["age"] for r in valid_responses]
    avg_age = sum(ages) / len(ages)
    print(f"平均年龄: {avg_age:.1f} 岁")

    # 收入
    incomes = [r["income"] for r in valid_responses]
    avg_income = sum(incomes) / len(incomes)
    print(f"平均收入: ${avg_income:,.0f}")

    # 满意度
    satisfactions = [r["satisfaction"] for r in valid_responses]
    avg_satisfaction = sum(satisfactions) / len(satisfactions)
    print(f"平均满意度: {avg_satisfaction:.2f} / 5.0")

输出：

=== 问卷数据质量检查 ===

 问卷 1: 通过
 问卷 2: 年龄异常(-5)
 问卷 3: 收入异常(-10000)
 问卷 4: 满意度异常(6)
 问卷 5: 通过
 问卷 6: 通过

=== 汇总 ===
总问卷数: 6
有效问卷: 3
无效问卷: 3
有效率: 50.0%

=== 有效数据描述统计 ===
平均年龄: 29.3 岁
平均收入: $63,333
平均满意度: 4.33 / 5.0

常见错误

错误 1：修改正在遍历的列表

python

#  危险操作
numbers = [1, 2, 3, 4, 5]
for num in numbers:
    if num % 2 == 0:
        numbers.remove(num)  # 会导致跳过元素

#  正确做法：创建新列表
numbers = [1, 2, 3, 4, 5]
odd_numbers = [num for num in numbers if num % 2 != 0]

错误 2：range() 的结束值

python

#  误解
for i in range(5):
    print(i)
# 输出: 0 1 2 3 4（不包括 5！）

#  如果要包括 5
for i in range(1, 6):
    print(i)
# 输出: 1 2 3 4 5

错误 3：缩进错误

python

#  缩进问题
for i in range(3):
print(i)  # IndentationError

#  正确缩进
for i in range(3):
    print(i)

练习题

练习 1：成绩统计

python

scores = [85, 92, 78, 90, 65, 88, 95, 70]

# 任务:
# 1. 计算平均分
# 2. 统计及格人数（>= 60）
# 3. 找出最高分和最低分
# 4. 计算标准差（可选）

练习 2：数据清洗

python

raw_data = [
    {"name": "Alice", "age": 25, "income": 50000},
    {"name": "Bob", "age": -5, "income": 60000},      # 年龄异常
    {"name": "Carol", "age": 30, "income": None},     # 收入缺失
    {"name": "David", "age": 150, "income": 70000},   # 年龄异常
    {"name": "Emma", "age": 28, "income": 55000},
]

# 任务:
# 1. 筛选出所有有效数据（年龄18-100，收入非空）
# 2. 计算有效数据的平均年龄和收入
# 3. 打印清洗报告

练习 3：交叉表统计

python

# 生成性别 × 教育水平的交叉表
data = [
    {"gender": "Male", "education": "Bachelor's"},
    {"gender": "Female", "education": "Master's"},
    {"gender": "Male", "education": "Bachelor's"},
    {"gender": "Female", "education": "Bachelor's"},
    {"gender": "Male", "education": "Master's"},
]

# 任务: 统计各组合的人数
# 期望输出:
# Male, Bachelor's: 2
# Male, Master's: 1
# Female, Bachelor's: 1
# Female, Master's: 1

下一步

恭喜你完成了 基础语法 模块！你现在已经掌握了：

变量和数据类型
运算符
条件语句
循环

在下一个模块中，我们将学习 数据结构（列表、字典等），这是数据分析的核心工具。

循环（for/while） ​

什么是循环？ ​

for 循环 ​

1. 遍历列表 ​

2. 遍历数字范围 ​

3. 对比 Stata/R ​

实战案例：数据处理 ​

案例 1：批量计算 BMI ​

案例 2：数据质量检查 ​

案例 3：收入分组统计 ​

while 循环 ​

基本语法 ​

实用场景：数据收集 ​

️ 避免无限循环 ​

循环控制：break 和 continue ​

1. break：提前退出循环 ​

2. continue：跳过当前迭代 ​

实战：跳过缺失数据 ​

高级循环技巧 ​

1. 列表推导式（List Comprehension） ​

2. enumerate()：同时获取索引和值 ​

3. zip()：并行遍历多个列表 ​

4. 嵌套循环 ​

完整实战：问卷数据批量处理 ​

常见错误 ​

错误 1：修改正在遍历的列表 ​

错误 2：range() 的结束值 ​

错误 3：缩进错误 ​

练习题 ​

练习 1：成绩统计 ​

练习 2：数据清洗 ​

练习 3：交叉表统计 ​

下一步 ​

循环（for/while）

什么是循环？

for 循环

1. 遍历列表

2. 遍历数字范围

3. 对比 Stata/R

实战案例：数据处理

案例 1：批量计算 BMI

案例 2：数据质量检查

案例 3：收入分组统计

while 循环

基本语法

实用场景：数据收集

️ 避免无限循环

循环控制：break 和 continue

1. break：提前退出循环

2. continue：跳过当前迭代

实战：跳过缺失数据

高级循环技巧

1. 列表推导式（List Comprehension）

2. enumerate()：同时获取索引和值

3. zip()：并行遍历多个列表

4. 嵌套循环

完整实战：问卷数据批量处理

常见错误

错误 1：修改正在遍历的列表

错误 2：range() 的结束值

错误 3：缩进错误

练习题

练习 1：成绩统计

练习 2：数据清洗

练习 3：交叉表统计

下一步