Loops (for/while)
Making Programs Repeat Tasks — The Key to Batch Data Processing
What are Loops?
Loops allow programs to execute code repeatedly, avoiding redundant writing.
Daily Analogy:
- Manual approach: Check 1000 questionnaires one by one (exhausting)
- Loop approach: Write checking logic once, automatically process 1000 times (easy)
Stata/R Comparison:
- Stata:
foreach,forvalues,while - R:
for,while,applyfamily - Python:
for,while
for Loop
1. Iterating Through Lists
python
# Student roster
students = ["Alice", "Bob", "Carol", "David"]
# Print one by one
for student in students:
print(f"Student: {student}")
# Output:
# Student: Alice
# Student: Bob
# Student: Carol
# Student: DavidSyntax:
python
for variable in sequence:
code to execute2. Iterating Through Number Ranges
python
# range(n): generates 0 to n-1
for i in range(5):
print(i)
# Output: 0 1 2 3 4
# range(start, end): generates start to end-1
for i in range(1, 6):
print(i)
# Output: 1 2 3 4 5
# range(start, end, step): specify step size
for i in range(0, 10, 2):
print(i)
# Output: 0 2 4 6 83. Stata/R Comparison
Stata Example:
stata
* Stata
forvalues i = 1/5 {
display `i'
}
foreach var in age income education {
summarize `var'
}R Example:
r
# R
for (i in 1:5) {
print(i)
}
for (var in c("age", "income", "education")) {
print(summary(data[[var]]))
}Python Example:
python
# Python
for i in range(1, 6):
print(i)
variables = ["age", "income", "education"]
for var in variables:
print(f"Statistics for variable {var}")Practical Cases: Data Processing
Case 1: Batch BMI Calculation
python
# Respondent data
heights = [170, 165, 180, 175, 160] # cm
weights = [65, 55, 80, 70, 50] # kg
# Batch calculate BMI
bmis = []
for i in range(len(heights)):
height_m = heights[i] / 100
bmi = weights[i] / (height_m ** 2)
bmis.append(bmi)
print(f"Respondent {i+1}: BMI = {bmi:.2f}")
# Output:
# Respondent 1: BMI = 22.49
# Respondent 2: BMI = 20.20
# Respondent 3: BMI = 24.69
# ...More Pythonic Approach (Using zip):
python
heights = [170, 165, 180, 175, 160]
weights = [65, 55, 80, 70, 50]
for i, (h, w) in enumerate(zip(heights, weights), start=1):
height_m = h / 100
bmi = w / (height_m ** 2)
print(f"Respondent {i}: BMI = {bmi:.2f}")Case 2: Data Quality Check
python
# Survey data (age column)
ages = [25, 30, -5, 150, 45, 28, 0, 35]
# Check for outliers
print("=== Age Data Quality Check ===")
valid_count = 0
invalid_count = 0
for i, age in enumerate(ages, start=1):
if age < 0 or age > 120:
print(f"✗ Data {i} abnormal: {age}")
invalid_count += 1
elif age == 0:
print(f"⚠️ Data {i} suspicious: {age}")
else:
print(f"✓ Data {i} normal: {age}")
valid_count += 1
print(f"\nSummary: {valid_count} normal, {invalid_count} abnormal")Case 3: Income Group Statistics
python
# Respondent incomes
incomes = [45000, 75000, 120000, 55000, 95000, 30000, 150000]
# Group statistics
low_income = 0 # < 50000
mid_income = 0 # 50000-100000
high_income = 0 # > 100000
for income in incomes:
if income < 50000:
low_income += 1
elif income <= 100000:
mid_income += 1
else:
high_income += 1
print(f"Low income: {low_income} people")
print(f"Middle income: {mid_income} people")
print(f"High income: {high_income} people")
# Output:
# Low income: 2 people
# Middle income: 3 people
# High income: 2 peoplewhile Loop
Basic Syntax
python
# while loop: continues as long as condition is true
count = 0
while count < 5:
print(f"Count: {count}")
count += 1
# Output: 0 1 2 3 4Practical Scenario: Data Collection
python
# Simulate survey data collection (until 100 responses collected)
responses = 0
target = 100
while responses < target:
# Simulate data collection (in practice, read from database)
responses += 1
if responses % 10 == 0: # Show progress every 10
progress = (responses / target) * 100
print(f"Progress: {progress:.0f}% ({responses}/{target})")
print("✓ Data collection complete!")Avoid Infinite Loops
python
# ✗ Infinite loop (never stops)
count = 0
while count < 5:
print(count)
# Forgot to increment count! Program hangs
# ✓ Correct approach
count = 0
while count < 5:
print(count)
count += 1 # Ensure condition eventually becomes FalseLoop Control: break and continue
1. break: Exit Loop Early
python
# Stop at first invalid data
ages = [25, 30, 45, -5, 28, 35]
for i, age in enumerate(ages, start=1):
if age < 0 or age > 120:
print(f"✗ Abnormal data found (#{i}): {age}")
print("Stopping check")
break # Immediately exit loop
else:
print(f"✓ Data {i} normal")
# Output:
# ✓ Data 1 normal
# ✓ Data 2 normal
# ✓ Data 3 normal
# ✗ Abnormal data found (#4): -5
# Stopping check2. continue: Skip Current Iteration
python
# Process only even numbers
for i in range(10):
if i % 2 != 0: # If odd
continue # Skip remaining code, go to next iteration
print(f"{i} is even")
# Output: 0 2 4 6 8Practical: Skip Missing Data
python
# Survey data (None represents missing)
responses = [5, 4, None, 3, None, 5, 4, 2]
# Calculate average score (skip missing values)
total = 0
count = 0
for response in responses:
if response is None:
continue # Skip missing values
total += response
count += 1
if count > 0:
average = total / count
print(f"Average score: {average:.2f} (valid samples: {count})")
# Output: Average score: 3.83 (valid samples: 6)Advanced Loop Techniques
1. List Comprehensions
More concise loop syntax:
python
# Traditional for loop
squares = []
for i in range(10):
squares.append(i ** 2)
# List comprehension (one line)
squares = [i ** 2 for i in range(10)]
print(squares) # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
# List comprehension with condition
# Keep only squares of even numbers
even_squares = [i ** 2 for i in range(10) if i % 2 == 0]
print(even_squares) # [0, 4, 16, 36, 64]Social Science Application:
python
# Filter high-income respondents
incomes = [45000, 75000, 120000, 55000, 95000, 30000, 150000]
high_incomes = [inc for inc in incomes if inc > 100000]
print(high_incomes) # [120000, 150000]
# Log transformation of incomes
import math
log_incomes = [math.log(inc) for inc in incomes if inc > 0]2. enumerate(): Get Both Index and Value
python
students = ["Alice", "Bob", "Carol"]
# Not recommended (manually manage index)
for i in range(len(students)):
print(f"{i+1}. {students[i]}")
# Recommended (use enumerate)
for i, student in enumerate(students, start=1):
print(f"{i}. {student}")
# Output:
# 1. Alice
# 2. Bob
# 3. Carol3. zip(): Iterate Multiple Lists in Parallel
python
names = ["Alice", "Bob", "Carol"]
ages = [25, 30, 28]
majors = ["Economics", "Sociology", "Political Science"]
for name, age, major in zip(names, ages, majors):
print(f"{name}, {age} years old, major: {major}")
# Output:
# Alice, 25 years old, major: Economics
# Bob, 30 years old, major: Sociology
# Carol, 28 years old, major: Political Science4. Nested Loops
python
# Generate all possible survey combinations
genders = ["Male", "Female"]
age_groups = ["18-30", "31-45", "46-60"]
education_levels = ["High School", "Bachelor's", "Master's"]
print("=== All Possible Respondent Types ===")
count = 0
for gender in genders:
for age_group in age_groups:
for education in education_levels:
count += 1
print(f"{count}. {gender}, {age_group}, {education}")
# Total: 2 × 3 × 3 = 18 combinationsComplete Practical Example: Batch Survey Data Processing
python
# === Simulated survey data ===
survey_data = [
{"id": 1, "age": 25, "income": 50000, "satisfaction": 4},
{"id": 2, "age": -5, "income": 75000, "satisfaction": 5}, # Age abnormal
{"id": 3, "age": 30, "income": -10000, "satisfaction": 3}, # Income abnormal
{"id": 4, "age": 28, "income": 60000, "satisfaction": 6}, # Satisfaction abnormal
{"id": 5, "age": 35, "income": 80000, "satisfaction": 4},
{"id": 6, "age": 40, "income": 95000, "satisfaction": 5},
]
# === Data quality check ===
print("=== Survey Data Quality Check ===\n")
valid_responses = []
invalid_responses = []
for response in survey_data:
resp_id = response["id"]
age = response["age"]
income = response["income"]
satisfaction = response["satisfaction"]
# Validation rules
errors = []
if age < 18 or age > 100:
errors.append(f"Age abnormal({age})")
if income < 0:
errors.append(f"Income abnormal({income})")
if satisfaction < 1 or satisfaction > 5:
errors.append(f"Satisfaction abnormal({satisfaction})")
# Classify
if errors:
print(f"✗ Survey {resp_id}: {', '.join(errors)}")
invalid_responses.append(response)
else:
print(f"✓ Survey {resp_id}: Pass")
valid_responses.append(response)
# === Summary statistics ===
print(f"\n=== Summary ===")
print(f"Total surveys: {len(survey_data)}")
print(f"Valid surveys: {len(valid_responses)}")
print(f"Invalid surveys: {len(invalid_responses)}")
print(f"Valid rate: {len(valid_responses)/len(survey_data)*100:.1f}%")
# === Descriptive statistics (valid data only) ===
if valid_responses:
print(f"\n=== Valid Data Descriptive Statistics ===")
# Age
ages = [r["age"] for r in valid_responses]
avg_age = sum(ages) / len(ages)
print(f"Average age: {avg_age:.1f} years")
# Income
incomes = [r["income"] for r in valid_responses]
avg_income = sum(incomes) / len(incomes)
print(f"Average income: ${avg_income:,.0f}")
# Satisfaction
satisfactions = [r["satisfaction"] for r in valid_responses]
avg_satisfaction = sum(satisfactions) / len(satisfactions)
print(f"Average satisfaction: {avg_satisfaction:.2f} / 5.0")Output:
=== Survey Data Quality Check ===
✓ Survey 1: Pass
✗ Survey 2: Age abnormal(-5)
✗ Survey 3: Income abnormal(-10000)
✗ Survey 4: Satisfaction abnormal(6)
✓ Survey 5: Pass
✓ Survey 6: Pass
=== Summary ===
Total surveys: 6
Valid surveys: 3
Invalid surveys: 3
Valid rate: 50.0%
=== Valid Data Descriptive Statistics ===
Average age: 29.3 years
Average income: $63,333
Average satisfaction: 4.33 / 5.0Common Errors
Error 1: Modifying List While Iterating
python
# ✗ Dangerous operation
numbers = [1, 2, 3, 4, 5]
for num in numbers:
if num % 2 == 0:
numbers.remove(num) # Can cause skipped elements
# ✓ Correct approach: create new list
numbers = [1, 2, 3, 4, 5]
odd_numbers = [num for num in numbers if num % 2 != 0]Error 2: range() End Value
python
# ✗ Misunderstanding
for i in range(5):
print(i)
# Output: 0 1 2 3 4 (doesn't include 5!)
# ✓ If you want to include 5
for i in range(1, 6):
print(i)
# Output: 1 2 3 4 5Error 3: Indentation Error
python
# ✗ Indentation problem
for i in range(3):
print(i) # IndentationError
# ✓ Correct indentation
for i in range(3):
print(i)Practice Exercises
Exercise 1: Grade Statistics
python
scores = [85, 92, 78, 90, 65, 88, 95, 70]
# Tasks:
# 1. Calculate average score
# 2. Count passing students (>= 60)
# 3. Find highest and lowest scores
# 4. Calculate standard deviation (optional)Exercise 2: Data Cleaning
python
raw_data = [
{"name": "Alice", "age": 25, "income": 50000},
{"name": "Bob", "age": -5, "income": 60000}, # Age abnormal
{"name": "Carol", "age": 30, "income": None}, # Income missing
{"name": "David", "age": 150, "income": 70000}, # Age abnormal
{"name": "Emma", "age": 28, "income": 55000},
]
# Tasks:
# 1. Filter all valid data (age 18-100, income not null)
# 2. Calculate average age and income for valid data
# 3. Print cleaning reportExercise 3: Cross-tabulation Statistics
python
# Generate gender × education level cross-tabulation
data = [
{"gender": "Male", "education": "Bachelor's"},
{"gender": "Female", "education": "Master's"},
{"gender": "Male", "education": "Bachelor's"},
{"gender": "Female", "education": "Bachelor's"},
{"gender": "Male", "education": "Master's"},
]
# Task: Count each combination
# Expected output:
# Male, Bachelor's: 2
# Male, Master's: 1
# Female, Bachelor's: 1
# Female, Master's: 1Next Steps
Congratulations on completing the Basic Syntax module! You now have mastered:
- Variables and data types
- Operators
- Conditional statements
- Loops
In the next module, we'll learn about data structures (lists, dictionaries, etc.), which are core tools for data analysis.
Ready? Let's keep going!