Lists
Python's most commonly used data structure — understand the Python version of R vectors and Stata variables
What is a List?
A list is Python's most fundamental data structure, used to store ordered, mutable collections of elements.
Comparative Understanding
| Concept | Python | R | Stata |
|---|---|---|---|
| Ordered collection | list | vector | Variable |
| Example | [1, 2, 3] | c(1, 2, 3) | gen x = ... |
Key Characteristics:
- Ordered: Elements have a fixed order
- Mutable: Can modify, add, delete elements
- Allows duplicates: Same value can appear multiple times
- Mixed types: Can contain different data types (but not recommended)
Creating Lists
1. Basic Creation
python
# Empty list
empty_list = []
# Integer list
ages = [25, 30, 35, 40, 45]
# String list
names = ["Alice", "Bob", "Carol", "David"]
# Mixed types (not recommended, but possible)
mixed = [25, "Alice", 3.14, True]
print(ages) # [25, 30, 35, 40, 45]
print(names) # ['Alice', 'Bob', 'Carol', 'David']2. Using range()
python
# Generate 0 to 9
numbers = list(range(10))
print(numbers) # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
# Generate 1 to 10
numbers = list(range(1, 11))
print(numbers) # [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# Generate even numbers
evens = list(range(0, 20, 2))
print(evens) # [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]3. Using List Comprehensions
python
# Generate square numbers
squares = [x**2 for x in range(1, 6)]
print(squares) # [1, 4, 9, 16, 25]
# Generate even numbers
evens = [x for x in range(20) if x % 2 == 0]
print(evens) # [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]Accessing List Elements
1. Index Access (starts at 0)
python
students = ["Alice", "Bob", "Carol", "David", "Emma"]
# Forward indexing (starts at 0)
print(students[0]) # Alice (1st)
print(students[1]) # Bob (2nd)
print(students[4]) # Emma (5th)
# Backward indexing (starts at -1)
print(students[-1]) # Emma (last)
print(students[-2]) # David (2nd from end)⚠️ Note: Python indexing starts at 0, while R and Stata start at 1!
| Language | First element | Last element |
|---|---|---|
| Python | list[0] | list[-1] |
| R | vector[1] | vector[length(vector)] |
| Stata | var[1] | var[_N] |
2. Slicing
python
numbers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
# Basic slicing [start:end] (end not included)
print(numbers[2:5]) # [2, 3, 4]
print(numbers[0:3]) # [0, 1, 2]
print(numbers[5:]) # [5, 6, 7, 8, 9] (from index 5 to end)
print(numbers[:5]) # [0, 1, 2, 3, 4] (from start to index 5)
print(numbers[:]) # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] (all)
# Slicing with step [start:end:step]
print(numbers[::2]) # [0, 2, 4, 6, 8] (every other)
print(numbers[1::2]) # [1, 3, 5, 7, 9] (odd numbers)
print(numbers[::-1]) # [9, 8, 7, 6, 5, 4, 3, 2, 1, 0] (reversed)Social Science Application Example:
python
# Respondent age data
ages = [25, 28, 30, 35, 40, 45, 50, 55, 60, 65]
# Get first 5 respondents
first_five = ages[:5]
print(first_five) # [25, 28, 30, 35, 40]
# Get last 5 respondents
last_five = ages[-5:]
print(last_five) # [45, 50, 55, 60, 65]
# Get middle-aged group (index 2 to 7)
middle_age = ages[2:7]
print(middle_age) # [30, 35, 40, 45, 50]✏️ Modifying Lists
1. Modifying Single Elements
python
scores = [85, 90, 78, 92, 88]
# Modify first score
scores[0] = 95
print(scores) # [95, 90, 78, 92, 88]
# Modify last score
scores[-1] = 90
print(scores) # [95, 90, 78, 92, 90]2. Adding Elements
python
students = ["Alice", "Bob"]
# append(): Add one element at the end
students.append("Carol")
print(students) # ['Alice', 'Bob', 'Carol']
# insert(): Insert at specified position
students.insert(1, "David") # Insert at index 1
print(students) # ['Alice', 'David', 'Bob', 'Carol']
# extend(): Add multiple elements
students.extend(["Emma", "Frank"])
print(students) # ['Alice', 'David', 'Bob', 'Carol', 'Emma', 'Frank']
# + operator: Combine lists
new_students = students + ["Grace", "Henry"]
print(new_students) # ['Alice', 'David', 'Bob', 'Carol', 'Emma', 'Frank', 'Grace', 'Henry']3. Removing Elements
python
numbers = [1, 2, 3, 4, 5, 3]
# remove(): Remove first matching value
numbers.remove(3)
print(numbers) # [1, 2, 4, 5, 3] (only removed first 3)
# pop(): Remove element at specified index (default: last)
last = numbers.pop()
print(last) # 3
print(numbers) # [1, 2, 4, 5]
first = numbers.pop(0)
print(first) # 1
print(numbers) # [2, 4, 5]
# del: Delete specified index or slice
del numbers[1]
print(numbers) # [2, 5]
# clear(): Empty the list
numbers.clear()
print(numbers) # []📊 List Operations
1. Basic Operations
python
numbers = [3, 1, 4, 1, 5, 9, 2, 6]
# Length
print(len(numbers)) # 8
# Max/min/sum
print(max(numbers)) # 9
print(min(numbers)) # 1
print(sum(numbers)) # 31
# Average (need to calculate manually)
average = sum(numbers) / len(numbers)
print(average) # 3.875
# Sorting (modifies original list)
numbers.sort()
print(numbers) # [1, 1, 2, 3, 4, 5, 6, 9]
# Reverse
numbers.reverse()
print(numbers) # [9, 6, 5, 4, 3, 2, 1, 1]
# sorted(): Returns new list (doesn't modify original)
numbers = [3, 1, 4, 1, 5, 9, 2, 6]
sorted_numbers = sorted(numbers)
print(sorted_numbers) # [1, 1, 2, 3, 4, 5, 6, 9]
print(numbers) # [3, 1, 4, 1, 5, 9, 2, 6] (original unchanged)2. Finding Elements
python
majors = ["Economics", "Sociology", "Economics", "Political Science"]
# count(): Count occurrences
print(majors.count("Economics")) # 2
# index(): Find first occurrence index
print(majors.index("Sociology")) # 1
# in: Check if exists
print("Economics" in majors) # True
print("Physics" in majors) # False🔬 Real-World Cases
Case 1: Survey Score Statistics
python
# Student scores
scores = [85, 92, 78, 90, 88, 76, 95, 82, 89, 91]
# Descriptive statistics
print("=== Score Statistics ===")
print(f"Sample size: {len(scores)}")
print(f"Highest score: {max(scores)}")
print(f"Lowest score: {min(scores)}")
print(f"Total: {sum(scores)}")
print(f"Average: {sum(scores)/len(scores):.2f}")
# Passing rate
passing = [s for s in scores if s >= 60]
passing_rate = len(passing) / len(scores) * 100
print(f"Passing rate: {passing_rate:.1f}%")
# Grade distribution
excellent = len([s for s in scores if s >= 90])
good = len([s for s in scores if 80 <= s < 90])
fair = len([s for s in scores if 70 <= s < 80])
poor = len([s for s in scores if s < 70])
print(f"\n=== Grade Distribution ===")
print(f"Excellent (90+): {excellent} students")
print(f"Good (80-89): {good} students")
print(f"Fair (70-79): {fair} students")
print(f"Pass (60-69): {poor} students")Case 2: Income Data Cleaning
python
# Raw income data (contains outliers)
raw_incomes = [50000, 65000, -5000, 80000, 1000000, 55000, 70000, 0]
# Data cleaning
print("=== Income Data Cleaning ===")
print(f"Original count: {len(raw_incomes)}")
# Filter valid data (positive and not exceeding 500k)
clean_incomes = []
for income in raw_incomes:
if 0 < income <= 500000:
clean_incomes.append(income)
print(f"After cleaning: {len(clean_incomes)}")
print(f"Removed: {len(raw_incomes) - len(clean_incomes)}")
# More concise (list comprehension)
clean_incomes = [inc for inc in raw_incomes if 0 < inc <= 500000]
# Statistics
print(f"\n=== Statistics After Cleaning ===")
print(f"Average income: ${sum(clean_incomes)/len(clean_incomes):,.0f}")
print(f"Median: ${sorted(clean_incomes)[len(clean_incomes)//2]:,.0f}")
print(f"Highest income: ${max(clean_incomes):,.0f}")
print(f"Lowest income: ${min(clean_incomes):,.0f}")Case 3: Grouped Statistics
python
# Respondent ages
ages = [22, 25, 28, 30, 35, 38, 42, 45, 50, 55, 60, 65, 28, 32, 40]
# Age grouping
youth = [age for age in ages if age < 30]
middle = [age for age in ages if 30 <= age < 50]
senior = [age for age in ages if age >= 50]
print("=== Age Group Statistics ===")
print(f"Youth (<30): {len(youth)} people, avg {sum(youth)/len(youth):.1f} years")
print(f"Middle (30-49): {len(middle)} people, avg {sum(middle)/len(middle):.1f} years")
print(f"Senior (50+): {len(senior)} people, avg {sum(senior)/len(senior):.1f} years")
# Group percentages
total = len(ages)
print(f"\n=== Age Distribution ===")
print(f"Youth: {len(youth)/total*100:.1f}%")
print(f"Middle: {len(middle)/total*100:.1f}%")
print(f"Senior: {len(senior)/total*100:.1f}%")🔄 Comparison: Python List vs R Vector vs Stata Variable
Creating Data
| Operation | Python | R | Stata |
|---|---|---|---|
| Create | ages = [25, 30, 35] | ages <- c(25, 30, 35) | gen age = ... |
| Length | len(ages) | length(ages) | count |
| Access first | ages[0] | ages[1] | age[1] |
| Access last | ages[-1] | ages[length(ages)] | age[_N] |
Statistical Operations
| Operation | Python | R | Stata |
|---|---|---|---|
| Sum | sum(ages) | sum(ages) | egen total = sum(age) |
| Mean | sum(ages)/len(ages) | mean(ages) | summarize age |
| Max | max(ages) | max(ages) | egen max_age = max(age) |
| Sort | sorted(ages) | sort(ages) | sort age |
⚠️ Common Errors
Error 1: Index Out of Range
python
students = ["Alice", "Bob", "Carol"]
print(students[3]) # ❌ IndexError: list index out of range
print(students[2]) # ✅ Carol (last is index 2)Error 2: Confusing append() and extend()
python
numbers = [1, 2, 3]
numbers.append([4, 5])
print(numbers) # [1, 2, 3, [4, 5]] (entire list as one element)
numbers = [1, 2, 3]
numbers.extend([4, 5])
print(numbers) # [1, 2, 3, 4, 5] (added individually)Error 3: Direct Assignment vs Copying
python
# ❌ Direct assignment (both variables point to same list)
list1 = [1, 2, 3]
list2 = list1
list2.append(4)
print(list1) # [1, 2, 3, 4] (list1 changed too!)
# ✅ Copy list
list1 = [1, 2, 3]
list2 = list1.copy() # or list2 = list1[:]
list2.append(4)
print(list1) # [1, 2, 3] (list1 unchanged)
print(list2) # [1, 2, 3, 4]💪 Practice Problems
Exercise 1: GPA Calculation
python
# Student scores (0-100 scale)
scores = [85, 92, 78, 88, 95, 82, 90, 76, 89, 91]
# Tasks:
# 1. Convert to 4.0 GPA scale (90-100: 4.0, 80-89: 3.0, 70-79: 2.0, 60-69: 1.0)
# 2. Calculate average GPA
# 3. Count students in each grade bracketExercise 2: Data Filtering
python
# Respondent incomes
incomes = [45000, 75000, 120000, 35000, 95000, 60000, 150000, 50000]
# Tasks:
# 1. Filter middle-income earners (50000-100000)
# 2. Calculate average income of middle-income earners
# 3. Sort all incomes from low to highExercise 3: Survey ID Generation
python
# Task: Generate 100 survey IDs
# Format: Q001, Q002, Q003, ..., Q100
# Hint: Use list comprehension + string formatting📚 Next Steps
In the next section, we'll learn about Tuples, the "immutable version" of lists, suitable for storing data that shouldn't be modified.
Keep learning!