Loops (for/while)

Making Programs Repeat Tasks — The Key to Batch Data Processing

What are Loops?

Loops allow programs to execute code repeatedly, avoiding redundant writing.

Daily Analogy:

Manual approach: Check 1000 questionnaires one by one (exhausting)
Loop approach: Write checking logic once, automatically process 1000 times (easy)

Stata/R Comparison:

Stata: foreach, forvalues, while
R: for, while, apply family
Python: for, while

for Loop

1. Iterating Through Lists

python

# Student roster
students = ["Alice", "Bob", "Carol", "David"]

# Print one by one
for student in students:
    print(f"Student: {student}")

# Output:
# Student: Alice
# Student: Bob
# Student: Carol
# Student: David

Syntax:

python

for variable in sequence:
    code to execute

2. Iterating Through Number Ranges

python

# range(n): generates 0 to n-1
for i in range(5):
    print(i)

# Output: 0 1 2 3 4

# range(start, end): generates start to end-1
for i in range(1, 6):
    print(i)

# Output: 1 2 3 4 5

# range(start, end, step): specify step size
for i in range(0, 10, 2):
    print(i)

# Output: 0 2 4 6 8

3. Stata/R Comparison

Stata Example:

stata

* Stata
forvalues i = 1/5 {
    display `i'
}

foreach var in age income education {
    summarize `var'
}

R Example:

# R
for (i in 1:5) {
  print(i)
}

for (var in c("age", "income", "education")) {
  print(summary(data[[var]]))
}

Python Example:

python

# Python
for i in range(1, 6):
    print(i)

variables = ["age", "income", "education"]
for var in variables:
    print(f"Statistics for variable {var}")

Practical Cases: Data Processing

Case 1: Batch BMI Calculation

python

# Respondent data
heights = [170, 165, 180, 175, 160]  # cm
weights = [65, 55, 80, 70, 50]       # kg

# Batch calculate BMI
bmis = []
for i in range(len(heights)):
    height_m = heights[i] / 100
    bmi = weights[i] / (height_m ** 2)
    bmis.append(bmi)
    print(f"Respondent {i+1}: BMI = {bmi:.2f}")

# Output:
# Respondent 1: BMI = 22.49
# Respondent 2: BMI = 20.20
# Respondent 3: BMI = 24.69
# ...

More Pythonic Approach (Using zip):

python

heights = [170, 165, 180, 175, 160]
weights = [65, 55, 80, 70, 50]

for i, (h, w) in enumerate(zip(heights, weights), start=1):
    height_m = h / 100
    bmi = w / (height_m ** 2)
    print(f"Respondent {i}: BMI = {bmi:.2f}")

Case 2: Data Quality Check

python

# Survey data (age column)
ages = [25, 30, -5, 150, 45, 28, 0, 35]

# Check for outliers
print("=== Age Data Quality Check ===")
valid_count = 0
invalid_count = 0

for i, age in enumerate(ages, start=1):
    if age < 0 or age > 120:
        print(f"✗ Data {i} abnormal: {age}")
        invalid_count += 1
    elif age == 0:
        print(f"⚠️  Data {i} suspicious: {age}")
    else:
        print(f"✓ Data {i} normal: {age}")
        valid_count += 1

print(f"\nSummary: {valid_count} normal, {invalid_count} abnormal")

Case 3: Income Group Statistics

python

# Respondent incomes
incomes = [45000, 75000, 120000, 55000, 95000, 30000, 150000]

# Group statistics
low_income = 0    # < 50000
mid_income = 0    # 50000-100000
high_income = 0   # > 100000

for income in incomes:
    if income < 50000:
        low_income += 1
    elif income <= 100000:
        mid_income += 1
    else:
        high_income += 1

print(f"Low income: {low_income} people")
print(f"Middle income: {mid_income} people")
print(f"High income: {high_income} people")

# Output:
# Low income: 2 people
# Middle income: 3 people
# High income: 2 people

while Loop

Basic Syntax

python

# while loop: continues as long as condition is true
count = 0

while count < 5:
    print(f"Count: {count}")
    count += 1

# Output: 0 1 2 3 4

Practical Scenario: Data Collection

python

# Simulate survey data collection (until 100 responses collected)
responses = 0
target = 100

while responses < target:
    # Simulate data collection (in practice, read from database)
    responses += 1

    if responses % 10 == 0:  # Show progress every 10
        progress = (responses / target) * 100
        print(f"Progress: {progress:.0f}% ({responses}/{target})")

print("✓ Data collection complete!")

Avoid Infinite Loops

python

# ✗ Infinite loop (never stops)
count = 0
while count < 5:
    print(count)
    # Forgot to increment count! Program hangs

# ✓ Correct approach
count = 0
while count < 5:
    print(count)
    count += 1  # Ensure condition eventually becomes False

Loop Control: break and continue

1. break: Exit Loop Early

python

# Stop at first invalid data
ages = [25, 30, 45, -5, 28, 35]

for i, age in enumerate(ages, start=1):
    if age < 0 or age > 120:
        print(f"✗ Abnormal data found (#{i}): {age}")
        print("Stopping check")
        break  # Immediately exit loop
    else:
        print(f"✓ Data {i} normal")

# Output:
# ✓ Data 1 normal
# ✓ Data 2 normal
# ✓ Data 3 normal
# ✗ Abnormal data found (#4): -5
# Stopping check

2. continue: Skip Current Iteration

python

# Process only even numbers
for i in range(10):
    if i % 2 != 0:  # If odd
        continue    # Skip remaining code, go to next iteration

    print(f"{i} is even")

# Output: 0 2 4 6 8

Practical: Skip Missing Data

python

# Survey data (None represents missing)
responses = [5, 4, None, 3, None, 5, 4, 2]

# Calculate average score (skip missing values)
total = 0
count = 0

for response in responses:
    if response is None:
        continue  # Skip missing values

    total += response
    count += 1

if count > 0:
    average = total / count
    print(f"Average score: {average:.2f} (valid samples: {count})")

# Output: Average score: 3.83 (valid samples: 6)

Advanced Loop Techniques

1. List Comprehensions

More concise loop syntax:

python

# Traditional for loop
squares = []
for i in range(10):
    squares.append(i ** 2)

# List comprehension (one line)
squares = [i ** 2 for i in range(10)]
print(squares)  # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

# List comprehension with condition
# Keep only squares of even numbers
even_squares = [i ** 2 for i in range(10) if i % 2 == 0]
print(even_squares)  # [0, 4, 16, 36, 64]

Social Science Application:

python

# Filter high-income respondents
incomes = [45000, 75000, 120000, 55000, 95000, 30000, 150000]
high_incomes = [inc for inc in incomes if inc > 100000]
print(high_incomes)  # [120000, 150000]

# Log transformation of incomes
import math
log_incomes = [math.log(inc) for inc in incomes if inc > 0]

2. enumerate(): Get Both Index and Value

python

students = ["Alice", "Bob", "Carol"]

# Not recommended (manually manage index)
for i in range(len(students)):
    print(f"{i+1}. {students[i]}")

# Recommended (use enumerate)
for i, student in enumerate(students, start=1):
    print(f"{i}. {student}")

# Output:
# 1. Alice
# 2. Bob
# 3. Carol

3. zip(): Iterate Multiple Lists in Parallel

python

names = ["Alice", "Bob", "Carol"]
ages = [25, 30, 28]
majors = ["Economics", "Sociology", "Political Science"]

for name, age, major in zip(names, ages, majors):
    print(f"{name}, {age} years old, major: {major}")

# Output:
# Alice, 25 years old, major: Economics
# Bob, 30 years old, major: Sociology
# Carol, 28 years old, major: Political Science

4. Nested Loops

python

# Generate all possible survey combinations
genders = ["Male", "Female"]
age_groups = ["18-30", "31-45", "46-60"]
education_levels = ["High School", "Bachelor's", "Master's"]

print("=== All Possible Respondent Types ===")
count = 0
for gender in genders:
    for age_group in age_groups:
        for education in education_levels:
            count += 1
            print(f"{count}. {gender}, {age_group}, {education}")

# Total: 2 × 3 × 3 = 18 combinations

Complete Practical Example: Batch Survey Data Processing

python

# === Simulated survey data ===
survey_data = [
    {"id": 1, "age": 25, "income": 50000, "satisfaction": 4},
    {"id": 2, "age": -5, "income": 75000, "satisfaction": 5},  # Age abnormal
    {"id": 3, "age": 30, "income": -10000, "satisfaction": 3}, # Income abnormal
    {"id": 4, "age": 28, "income": 60000, "satisfaction": 6},  # Satisfaction abnormal
    {"id": 5, "age": 35, "income": 80000, "satisfaction": 4},
    {"id": 6, "age": 40, "income": 95000, "satisfaction": 5},
]

# === Data quality check ===
print("=== Survey Data Quality Check ===\n")

valid_responses = []
invalid_responses = []

for response in survey_data:
    resp_id = response["id"]
    age = response["age"]
    income = response["income"]
    satisfaction = response["satisfaction"]

    # Validation rules
    errors = []

    if age < 18 or age > 100:
        errors.append(f"Age abnormal({age})")

    if income < 0:
        errors.append(f"Income abnormal({income})")

    if satisfaction < 1 or satisfaction > 5:
        errors.append(f"Satisfaction abnormal({satisfaction})")

    # Classify
    if errors:
        print(f"✗ Survey {resp_id}: {', '.join(errors)}")
        invalid_responses.append(response)
    else:
        print(f"✓ Survey {resp_id}: Pass")
        valid_responses.append(response)

# === Summary statistics ===
print(f"\n=== Summary ===")
print(f"Total surveys: {len(survey_data)}")
print(f"Valid surveys: {len(valid_responses)}")
print(f"Invalid surveys: {len(invalid_responses)}")
print(f"Valid rate: {len(valid_responses)/len(survey_data)*100:.1f}%")

# === Descriptive statistics (valid data only) ===
if valid_responses:
    print(f"\n=== Valid Data Descriptive Statistics ===")

    # Age
    ages = [r["age"] for r in valid_responses]
    avg_age = sum(ages) / len(ages)
    print(f"Average age: {avg_age:.1f} years")

    # Income
    incomes = [r["income"] for r in valid_responses]
    avg_income = sum(incomes) / len(incomes)
    print(f"Average income: ${avg_income:,.0f}")

    # Satisfaction
    satisfactions = [r["satisfaction"] for r in valid_responses]
    avg_satisfaction = sum(satisfactions) / len(satisfactions)
    print(f"Average satisfaction: {avg_satisfaction:.2f} / 5.0")

Output:

=== Survey Data Quality Check ===

✓ Survey 1: Pass
✗ Survey 2: Age abnormal(-5)
✗ Survey 3: Income abnormal(-10000)
✗ Survey 4: Satisfaction abnormal(6)
✓ Survey 5: Pass
✓ Survey 6: Pass

=== Summary ===
Total surveys: 6
Valid surveys: 3
Invalid surveys: 3
Valid rate: 50.0%

=== Valid Data Descriptive Statistics ===
Average age: 29.3 years
Average income: $63,333
Average satisfaction: 4.33 / 5.0

Common Errors

Error 1: Modifying List While Iterating

python

# ✗ Dangerous operation
numbers = [1, 2, 3, 4, 5]
for num in numbers:
    if num % 2 == 0:
        numbers.remove(num)  # Can cause skipped elements

# ✓ Correct approach: create new list
numbers = [1, 2, 3, 4, 5]
odd_numbers = [num for num in numbers if num % 2 != 0]

Error 2: range() End Value

python

# ✗ Misunderstanding
for i in range(5):
    print(i)
# Output: 0 1 2 3 4 (doesn't include 5!)

# ✓ If you want to include 5
for i in range(1, 6):
    print(i)
# Output: 1 2 3 4 5

Error 3: Indentation Error

python

# ✗ Indentation problem
for i in range(3):
print(i)  # IndentationError

# ✓ Correct indentation
for i in range(3):
    print(i)

Practice Exercises

Exercise 1: Grade Statistics

python

scores = [85, 92, 78, 90, 65, 88, 95, 70]

# Tasks:
# 1. Calculate average score
# 2. Count passing students (>= 60)
# 3. Find highest and lowest scores
# 4. Calculate standard deviation (optional)

Exercise 2: Data Cleaning

python

raw_data = [
    {"name": "Alice", "age": 25, "income": 50000},
    {"name": "Bob", "age": -5, "income": 60000},      # Age abnormal
    {"name": "Carol", "age": 30, "income": None},     # Income missing
    {"name": "David", "age": 150, "income": 70000},   # Age abnormal
    {"name": "Emma", "age": 28, "income": 55000},
]

# Tasks:
# 1. Filter all valid data (age 18-100, income not null)
# 2. Calculate average age and income for valid data
# 3. Print cleaning report

Exercise 3: Cross-tabulation Statistics

python

# Generate gender × education level cross-tabulation
data = [
    {"gender": "Male", "education": "Bachelor's"},
    {"gender": "Female", "education": "Master's"},
    {"gender": "Male", "education": "Bachelor's"},
    {"gender": "Female", "education": "Bachelor's"},
    {"gender": "Male", "education": "Master's"},
]

# Task: Count each combination
# Expected output:
# Male, Bachelor's: 2
# Male, Master's: 1
# Female, Bachelor's: 1
# Female, Master's: 1

Next Steps

Congratulations on completing the Basic Syntax module! You now have mastered:

Variables and data types
Operators
Conditional statements
Loops

In the next module, we'll learn about data structures (lists, dictionaries, etc.), which are core tools for data analysis.

Ready? Let's keep going!

Loops (for/while) ​

What are Loops? ​

for Loop ​

1. Iterating Through Lists ​

2. Iterating Through Number Ranges ​

3. Stata/R Comparison ​

Practical Cases: Data Processing ​

Case 1: Batch BMI Calculation ​

Case 2: Data Quality Check ​

Case 3: Income Group Statistics ​

while Loop ​

Basic Syntax ​

Practical Scenario: Data Collection ​

Avoid Infinite Loops ​

Loop Control: break and continue ​

1. break: Exit Loop Early ​

2. continue: Skip Current Iteration ​

Practical: Skip Missing Data ​

Advanced Loop Techniques ​

1. List Comprehensions ​

2. enumerate(): Get Both Index and Value ​

3. zip(): Iterate Multiple Lists in Parallel ​

4. Nested Loops ​

Complete Practical Example: Batch Survey Data Processing ​

Common Errors ​

Error 1: Modifying List While Iterating ​

Error 2: range() End Value ​

Error 3: Indentation Error ​

Practice Exercises ​

Exercise 1: Grade Statistics ​

Exercise 2: Data Cleaning ​

Exercise 3: Cross-tabulation Statistics ​

Next Steps ​

Loops (for/while)

What are Loops?

for Loop

1. Iterating Through Lists

2. Iterating Through Number Ranges

3. Stata/R Comparison

Practical Cases: Data Processing

Case 1: Batch BMI Calculation

Case 2: Data Quality Check

Case 3: Income Group Statistics

while Loop

Basic Syntax

Practical Scenario: Data Collection

Avoid Infinite Loops

Loop Control: break and continue

1. break: Exit Loop Early

2. continue: Skip Current Iteration

Practical: Skip Missing Data

Advanced Loop Techniques

1. List Comprehensions

2. enumerate(): Get Both Index and Value

3. zip(): Iterate Multiple Lists in Parallel

4. Nested Loops

Complete Practical Example: Batch Survey Data Processing

Common Errors

Error 1: Modifying List While Iterating

Error 2: range() End Value

Error 3: Indentation Error

Practice Exercises

Exercise 1: Grade Statistics

Exercise 2: Data Cleaning

Exercise 3: Cross-tabulation Statistics

Next Steps