Skip to content

Loops (for/while)

Making Programs Repeat Tasks — The Key to Batch Data Processing


What are Loops?

Loops allow programs to execute code repeatedly, avoiding redundant writing.

Daily Analogy:

  • Manual approach: Check 1000 questionnaires one by one (exhausting)
  • Loop approach: Write checking logic once, automatically process 1000 times (easy)

Stata/R Comparison:

  • Stata: foreach, forvalues, while
  • R: for, while, apply family
  • Python: for, while

for Loop

1. Iterating Through Lists

python
# Student roster
students = ["Alice", "Bob", "Carol", "David"]

# Print one by one
for student in students:
    print(f"Student: {student}")

# Output:
# Student: Alice
# Student: Bob
# Student: Carol
# Student: David

Syntax:

python
for variable in sequence:
    code to execute

2. Iterating Through Number Ranges

python
# range(n): generates 0 to n-1
for i in range(5):
    print(i)

# Output: 0 1 2 3 4

# range(start, end): generates start to end-1
for i in range(1, 6):
    print(i)

# Output: 1 2 3 4 5

# range(start, end, step): specify step size
for i in range(0, 10, 2):
    print(i)

# Output: 0 2 4 6 8

3. Stata/R Comparison

Stata Example:

stata
* Stata
forvalues i = 1/5 {
    display `i'
}

foreach var in age income education {
    summarize `var'
}

R Example:

r
# R
for (i in 1:5) {
  print(i)
}

for (var in c("age", "income", "education")) {
  print(summary(data[[var]]))
}

Python Example:

python
# Python
for i in range(1, 6):
    print(i)

variables = ["age", "income", "education"]
for var in variables:
    print(f"Statistics for variable {var}")

Practical Cases: Data Processing

Case 1: Batch BMI Calculation

python
# Respondent data
heights = [170, 165, 180, 175, 160]  # cm
weights = [65, 55, 80, 70, 50]       # kg

# Batch calculate BMI
bmis = []
for i in range(len(heights)):
    height_m = heights[i] / 100
    bmi = weights[i] / (height_m ** 2)
    bmis.append(bmi)
    print(f"Respondent {i+1}: BMI = {bmi:.2f}")

# Output:
# Respondent 1: BMI = 22.49
# Respondent 2: BMI = 20.20
# Respondent 3: BMI = 24.69
# ...

More Pythonic Approach (Using zip):

python
heights = [170, 165, 180, 175, 160]
weights = [65, 55, 80, 70, 50]

for i, (h, w) in enumerate(zip(heights, weights), start=1):
    height_m = h / 100
    bmi = w / (height_m ** 2)
    print(f"Respondent {i}: BMI = {bmi:.2f}")

Case 2: Data Quality Check

python
# Survey data (age column)
ages = [25, 30, -5, 150, 45, 28, 0, 35]

# Check for outliers
print("=== Age Data Quality Check ===")
valid_count = 0
invalid_count = 0

for i, age in enumerate(ages, start=1):
    if age < 0 or age > 120:
        print(f"✗ Data {i} abnormal: {age}")
        invalid_count += 1
    elif age == 0:
        print(f"⚠️  Data {i} suspicious: {age}")
    else:
        print(f"✓ Data {i} normal: {age}")
        valid_count += 1

print(f"\nSummary: {valid_count} normal, {invalid_count} abnormal")

Case 3: Income Group Statistics

python
# Respondent incomes
incomes = [45000, 75000, 120000, 55000, 95000, 30000, 150000]

# Group statistics
low_income = 0    # < 50000
mid_income = 0    # 50000-100000
high_income = 0   # > 100000

for income in incomes:
    if income < 50000:
        low_income += 1
    elif income <= 100000:
        mid_income += 1
    else:
        high_income += 1

print(f"Low income: {low_income} people")
print(f"Middle income: {mid_income} people")
print(f"High income: {high_income} people")

# Output:
# Low income: 2 people
# Middle income: 3 people
# High income: 2 people

while Loop

Basic Syntax

python
# while loop: continues as long as condition is true
count = 0

while count < 5:
    print(f"Count: {count}")
    count += 1

# Output: 0 1 2 3 4

Practical Scenario: Data Collection

python
# Simulate survey data collection (until 100 responses collected)
responses = 0
target = 100

while responses < target:
    # Simulate data collection (in practice, read from database)
    responses += 1

    if responses % 10 == 0:  # Show progress every 10
        progress = (responses / target) * 100
        print(f"Progress: {progress:.0f}% ({responses}/{target})")

print("✓ Data collection complete!")

Avoid Infinite Loops

python
# ✗ Infinite loop (never stops)
count = 0
while count < 5:
    print(count)
    # Forgot to increment count! Program hangs

# ✓ Correct approach
count = 0
while count < 5:
    print(count)
    count += 1  # Ensure condition eventually becomes False

Loop Control: break and continue

1. break: Exit Loop Early

python
# Stop at first invalid data
ages = [25, 30, 45, -5, 28, 35]

for i, age in enumerate(ages, start=1):
    if age < 0 or age > 120:
        print(f"✗ Abnormal data found (#{i}): {age}")
        print("Stopping check")
        break  # Immediately exit loop
    else:
        print(f"✓ Data {i} normal")

# Output:
# ✓ Data 1 normal
# ✓ Data 2 normal
# ✓ Data 3 normal
# ✗ Abnormal data found (#4): -5
# Stopping check

2. continue: Skip Current Iteration

python
# Process only even numbers
for i in range(10):
    if i % 2 != 0:  # If odd
        continue    # Skip remaining code, go to next iteration

    print(f"{i} is even")

# Output: 0 2 4 6 8

Practical: Skip Missing Data

python
# Survey data (None represents missing)
responses = [5, 4, None, 3, None, 5, 4, 2]

# Calculate average score (skip missing values)
total = 0
count = 0

for response in responses:
    if response is None:
        continue  # Skip missing values

    total += response
    count += 1

if count > 0:
    average = total / count
    print(f"Average score: {average:.2f} (valid samples: {count})")

# Output: Average score: 3.83 (valid samples: 6)

Advanced Loop Techniques

1. List Comprehensions

More concise loop syntax:

python
# Traditional for loop
squares = []
for i in range(10):
    squares.append(i ** 2)

# List comprehension (one line)
squares = [i ** 2 for i in range(10)]
print(squares)  # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

# List comprehension with condition
# Keep only squares of even numbers
even_squares = [i ** 2 for i in range(10) if i % 2 == 0]
print(even_squares)  # [0, 4, 16, 36, 64]

Social Science Application:

python
# Filter high-income respondents
incomes = [45000, 75000, 120000, 55000, 95000, 30000, 150000]
high_incomes = [inc for inc in incomes if inc > 100000]
print(high_incomes)  # [120000, 150000]

# Log transformation of incomes
import math
log_incomes = [math.log(inc) for inc in incomes if inc > 0]

2. enumerate(): Get Both Index and Value

python
students = ["Alice", "Bob", "Carol"]

# Not recommended (manually manage index)
for i in range(len(students)):
    print(f"{i+1}. {students[i]}")

# Recommended (use enumerate)
for i, student in enumerate(students, start=1):
    print(f"{i}. {student}")

# Output:
# 1. Alice
# 2. Bob
# 3. Carol

3. zip(): Iterate Multiple Lists in Parallel

python
names = ["Alice", "Bob", "Carol"]
ages = [25, 30, 28]
majors = ["Economics", "Sociology", "Political Science"]

for name, age, major in zip(names, ages, majors):
    print(f"{name}, {age} years old, major: {major}")

# Output:
# Alice, 25 years old, major: Economics
# Bob, 30 years old, major: Sociology
# Carol, 28 years old, major: Political Science

4. Nested Loops

python
# Generate all possible survey combinations
genders = ["Male", "Female"]
age_groups = ["18-30", "31-45", "46-60"]
education_levels = ["High School", "Bachelor's", "Master's"]

print("=== All Possible Respondent Types ===")
count = 0
for gender in genders:
    for age_group in age_groups:
        for education in education_levels:
            count += 1
            print(f"{count}. {gender}, {age_group}, {education}")

# Total: 2 × 3 × 3 = 18 combinations

Complete Practical Example: Batch Survey Data Processing

python
# === Simulated survey data ===
survey_data = [
    {"id": 1, "age": 25, "income": 50000, "satisfaction": 4},
    {"id": 2, "age": -5, "income": 75000, "satisfaction": 5},  # Age abnormal
    {"id": 3, "age": 30, "income": -10000, "satisfaction": 3}, # Income abnormal
    {"id": 4, "age": 28, "income": 60000, "satisfaction": 6},  # Satisfaction abnormal
    {"id": 5, "age": 35, "income": 80000, "satisfaction": 4},
    {"id": 6, "age": 40, "income": 95000, "satisfaction": 5},
]

# === Data quality check ===
print("=== Survey Data Quality Check ===\n")

valid_responses = []
invalid_responses = []

for response in survey_data:
    resp_id = response["id"]
    age = response["age"]
    income = response["income"]
    satisfaction = response["satisfaction"]

    # Validation rules
    errors = []

    if age < 18 or age > 100:
        errors.append(f"Age abnormal({age})")

    if income < 0:
        errors.append(f"Income abnormal({income})")

    if satisfaction < 1 or satisfaction > 5:
        errors.append(f"Satisfaction abnormal({satisfaction})")

    # Classify
    if errors:
        print(f"✗ Survey {resp_id}: {', '.join(errors)}")
        invalid_responses.append(response)
    else:
        print(f"✓ Survey {resp_id}: Pass")
        valid_responses.append(response)

# === Summary statistics ===
print(f"\n=== Summary ===")
print(f"Total surveys: {len(survey_data)}")
print(f"Valid surveys: {len(valid_responses)}")
print(f"Invalid surveys: {len(invalid_responses)}")
print(f"Valid rate: {len(valid_responses)/len(survey_data)*100:.1f}%")

# === Descriptive statistics (valid data only) ===
if valid_responses:
    print(f"\n=== Valid Data Descriptive Statistics ===")

    # Age
    ages = [r["age"] for r in valid_responses]
    avg_age = sum(ages) / len(ages)
    print(f"Average age: {avg_age:.1f} years")

    # Income
    incomes = [r["income"] for r in valid_responses]
    avg_income = sum(incomes) / len(incomes)
    print(f"Average income: ${avg_income:,.0f}")

    # Satisfaction
    satisfactions = [r["satisfaction"] for r in valid_responses]
    avg_satisfaction = sum(satisfactions) / len(satisfactions)
    print(f"Average satisfaction: {avg_satisfaction:.2f} / 5.0")

Output:

=== Survey Data Quality Check ===

✓ Survey 1: Pass
✗ Survey 2: Age abnormal(-5)
✗ Survey 3: Income abnormal(-10000)
✗ Survey 4: Satisfaction abnormal(6)
✓ Survey 5: Pass
✓ Survey 6: Pass

=== Summary ===
Total surveys: 6
Valid surveys: 3
Invalid surveys: 3
Valid rate: 50.0%

=== Valid Data Descriptive Statistics ===
Average age: 29.3 years
Average income: $63,333
Average satisfaction: 4.33 / 5.0

Common Errors

Error 1: Modifying List While Iterating

python
# ✗ Dangerous operation
numbers = [1, 2, 3, 4, 5]
for num in numbers:
    if num % 2 == 0:
        numbers.remove(num)  # Can cause skipped elements

# ✓ Correct approach: create new list
numbers = [1, 2, 3, 4, 5]
odd_numbers = [num for num in numbers if num % 2 != 0]

Error 2: range() End Value

python
# ✗ Misunderstanding
for i in range(5):
    print(i)
# Output: 0 1 2 3 4 (doesn't include 5!)

# ✓ If you want to include 5
for i in range(1, 6):
    print(i)
# Output: 1 2 3 4 5

Error 3: Indentation Error

python
# ✗ Indentation problem
for i in range(3):
print(i)  # IndentationError

# ✓ Correct indentation
for i in range(3):
    print(i)

Practice Exercises

Exercise 1: Grade Statistics

python
scores = [85, 92, 78, 90, 65, 88, 95, 70]

# Tasks:
# 1. Calculate average score
# 2. Count passing students (>= 60)
# 3. Find highest and lowest scores
# 4. Calculate standard deviation (optional)

Exercise 2: Data Cleaning

python
raw_data = [
    {"name": "Alice", "age": 25, "income": 50000},
    {"name": "Bob", "age": -5, "income": 60000},      # Age abnormal
    {"name": "Carol", "age": 30, "income": None},     # Income missing
    {"name": "David", "age": 150, "income": 70000},   # Age abnormal
    {"name": "Emma", "age": 28, "income": 55000},
]

# Tasks:
# 1. Filter all valid data (age 18-100, income not null)
# 2. Calculate average age and income for valid data
# 3. Print cleaning report

Exercise 3: Cross-tabulation Statistics

python
# Generate gender × education level cross-tabulation
data = [
    {"gender": "Male", "education": "Bachelor's"},
    {"gender": "Female", "education": "Master's"},
    {"gender": "Male", "education": "Bachelor's"},
    {"gender": "Female", "education": "Bachelor's"},
    {"gender": "Male", "education": "Master's"},
]

# Task: Count each combination
# Expected output:
# Male, Bachelor's: 2
# Male, Master's: 1
# Female, Bachelor's: 1
# Female, Master's: 1

Next Steps

Congratulations on completing the Basic Syntax module! You now have mastered:

  • Variables and data types
  • Operators
  • Conditional statements
  • Loops

In the next module, we'll learn about data structures (lists, dictionaries, etc.), which are core tools for data analysis.

Ready? Let's keep going!

Released under the MIT License. Content © Author.