Lists

Python's most commonly used data structure — understand the Python version of R vectors and Stata variables

What is a List?

A list is Python's most fundamental data structure, used to store ordered, mutable collections of elements.

Comparative Understanding

Concept	Python	R	Stata
Ordered collection	`list`	`vector`	Variable
Example	`[1, 2, 3]`	`c(1, 2, 3)`	`gen x = ...`

Key Characteristics:

Ordered: Elements have a fixed order
Mutable: Can modify, add, delete elements
Allows duplicates: Same value can appear multiple times
Mixed types: Can contain different data types (but not recommended)

Creating Lists

1. Basic Creation

python

# Empty list
empty_list = []

# Integer list
ages = [25, 30, 35, 40, 45]

# String list
names = ["Alice", "Bob", "Carol", "David"]

# Mixed types (not recommended, but possible)
mixed = [25, "Alice", 3.14, True]

print(ages)   # [25, 30, 35, 40, 45]
print(names)  # ['Alice', 'Bob', 'Carol', 'David']

2. Using range()

python

# Generate 0 to 9
numbers = list(range(10))
print(numbers)  # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

# Generate 1 to 10
numbers = list(range(1, 11))
print(numbers)  # [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# Generate even numbers
evens = list(range(0, 20, 2))
print(evens)  # [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

3. Using List Comprehensions

python

# Generate square numbers
squares = [x**2 for x in range(1, 6)]
print(squares)  # [1, 4, 9, 16, 25]

# Generate even numbers
evens = [x for x in range(20) if x % 2 == 0]
print(evens)  # [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

Accessing List Elements

1. Index Access (starts at 0)

python

students = ["Alice", "Bob", "Carol", "David", "Emma"]

# Forward indexing (starts at 0)
print(students[0])   # Alice (1st)
print(students[1])   # Bob (2nd)
print(students[4])   # Emma (5th)

# Backward indexing (starts at -1)
print(students[-1])  # Emma (last)
print(students[-2])  # David (2nd from end)

⚠️ Note: Python indexing starts at 0, while R and Stata start at 1!

Language	First element	Last element
Python	`list[0]`	`list[-1]`
R	`vector[1]`	`vector[length(vector)]`
Stata	`var[1]`	`var[_N]`

2. Slicing

python

numbers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

# Basic slicing [start:end] (end not included)
print(numbers[2:5])    # [2, 3, 4]
print(numbers[0:3])    # [0, 1, 2]
print(numbers[5:])     # [5, 6, 7, 8, 9] (from index 5 to end)
print(numbers[:5])     # [0, 1, 2, 3, 4] (from start to index 5)
print(numbers[:])      # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] (all)

# Slicing with step [start:end:step]
print(numbers[::2])    # [0, 2, 4, 6, 8] (every other)
print(numbers[1::2])   # [1, 3, 5, 7, 9] (odd numbers)
print(numbers[::-1])   # [9, 8, 7, 6, 5, 4, 3, 2, 1, 0] (reversed)

Social Science Application Example:

python

# Respondent age data
ages = [25, 28, 30, 35, 40, 45, 50, 55, 60, 65]

# Get first 5 respondents
first_five = ages[:5]
print(first_five)  # [25, 28, 30, 35, 40]

# Get last 5 respondents
last_five = ages[-5:]
print(last_five)  # [45, 50, 55, 60, 65]

# Get middle-aged group (index 2 to 7)
middle_age = ages[2:7]
print(middle_age)  # [30, 35, 40, 45, 50]

✏️ Modifying Lists

1. Modifying Single Elements

python

scores = [85, 90, 78, 92, 88]

# Modify first score
scores[0] = 95
print(scores)  # [95, 90, 78, 92, 88]

# Modify last score
scores[-1] = 90
print(scores)  # [95, 90, 78, 92, 90]

2. Adding Elements

python

students = ["Alice", "Bob"]

# append(): Add one element at the end
students.append("Carol")
print(students)  # ['Alice', 'Bob', 'Carol']

# insert(): Insert at specified position
students.insert(1, "David")  # Insert at index 1
print(students)  # ['Alice', 'David', 'Bob', 'Carol']

# extend(): Add multiple elements
students.extend(["Emma", "Frank"])
print(students)  # ['Alice', 'David', 'Bob', 'Carol', 'Emma', 'Frank']

# + operator: Combine lists
new_students = students + ["Grace", "Henry"]
print(new_students)  # ['Alice', 'David', 'Bob', 'Carol', 'Emma', 'Frank', 'Grace', 'Henry']

3. Removing Elements

python

numbers = [1, 2, 3, 4, 5, 3]

# remove(): Remove first matching value
numbers.remove(3)
print(numbers)  # [1, 2, 4, 5, 3] (only removed first 3)

# pop(): Remove element at specified index (default: last)
last = numbers.pop()
print(last)      # 3
print(numbers)   # [1, 2, 4, 5]

first = numbers.pop(0)
print(first)     # 1
print(numbers)   # [2, 4, 5]

# del: Delete specified index or slice
del numbers[1]
print(numbers)   # [2, 5]

# clear(): Empty the list
numbers.clear()
print(numbers)   # []

📊 List Operations

1. Basic Operations

python

numbers = [3, 1, 4, 1, 5, 9, 2, 6]

# Length
print(len(numbers))  # 8

# Max/min/sum
print(max(numbers))  # 9
print(min(numbers))  # 1
print(sum(numbers))  # 31

# Average (need to calculate manually)
average = sum(numbers) / len(numbers)
print(average)  # 3.875

# Sorting (modifies original list)
numbers.sort()
print(numbers)  # [1, 1, 2, 3, 4, 5, 6, 9]

# Reverse
numbers.reverse()
print(numbers)  # [9, 6, 5, 4, 3, 2, 1, 1]

# sorted(): Returns new list (doesn't modify original)
numbers = [3, 1, 4, 1, 5, 9, 2, 6]
sorted_numbers = sorted(numbers)
print(sorted_numbers)  # [1, 1, 2, 3, 4, 5, 6, 9]
print(numbers)         # [3, 1, 4, 1, 5, 9, 2, 6] (original unchanged)

2. Finding Elements

python

majors = ["Economics", "Sociology", "Economics", "Political Science"]

# count(): Count occurrences
print(majors.count("Economics"))  # 2

# index(): Find first occurrence index
print(majors.index("Sociology"))  # 1

# in: Check if exists
print("Economics" in majors)      # True
print("Physics" in majors)        # False

🔬 Real-World Cases

Case 1: Survey Score Statistics

python

# Student scores
scores = [85, 92, 78, 90, 88, 76, 95, 82, 89, 91]

# Descriptive statistics
print("=== Score Statistics ===")
print(f"Sample size: {len(scores)}")
print(f"Highest score: {max(scores)}")
print(f"Lowest score: {min(scores)}")
print(f"Total: {sum(scores)}")
print(f"Average: {sum(scores)/len(scores):.2f}")

# Passing rate
passing = [s for s in scores if s >= 60]
passing_rate = len(passing) / len(scores) * 100
print(f"Passing rate: {passing_rate:.1f}%")

# Grade distribution
excellent = len([s for s in scores if s >= 90])
good = len([s for s in scores if 80 <= s < 90])
fair = len([s for s in scores if 70 <= s < 80])
poor = len([s for s in scores if s < 70])

print(f"\n=== Grade Distribution ===")
print(f"Excellent (90+): {excellent} students")
print(f"Good (80-89): {good} students")
print(f"Fair (70-79): {fair} students")
print(f"Pass (60-69): {poor} students")

Case 2: Income Data Cleaning

python

# Raw income data (contains outliers)
raw_incomes = [50000, 65000, -5000, 80000, 1000000, 55000, 70000, 0]

# Data cleaning
print("=== Income Data Cleaning ===")
print(f"Original count: {len(raw_incomes)}")

# Filter valid data (positive and not exceeding 500k)
clean_incomes = []
for income in raw_incomes:
    if 0 < income <= 500000:
        clean_incomes.append(income)

print(f"After cleaning: {len(clean_incomes)}")
print(f"Removed: {len(raw_incomes) - len(clean_incomes)}")

# More concise (list comprehension)
clean_incomes = [inc for inc in raw_incomes if 0 < inc <= 500000]

# Statistics
print(f"\n=== Statistics After Cleaning ===")
print(f"Average income: ${sum(clean_incomes)/len(clean_incomes):,.0f}")
print(f"Median: ${sorted(clean_incomes)[len(clean_incomes)//2]:,.0f}")
print(f"Highest income: ${max(clean_incomes):,.0f}")
print(f"Lowest income: ${min(clean_incomes):,.0f}")

Case 3: Grouped Statistics

python

# Respondent ages
ages = [22, 25, 28, 30, 35, 38, 42, 45, 50, 55, 60, 65, 28, 32, 40]

# Age grouping
youth = [age for age in ages if age < 30]
middle = [age for age in ages if 30 <= age < 50]
senior = [age for age in ages if age >= 50]

print("=== Age Group Statistics ===")
print(f"Youth (<30): {len(youth)} people, avg {sum(youth)/len(youth):.1f} years")
print(f"Middle (30-49): {len(middle)} people, avg {sum(middle)/len(middle):.1f} years")
print(f"Senior (50+): {len(senior)} people, avg {sum(senior)/len(senior):.1f} years")

# Group percentages
total = len(ages)
print(f"\n=== Age Distribution ===")
print(f"Youth: {len(youth)/total*100:.1f}%")
print(f"Middle: {len(middle)/total*100:.1f}%")
print(f"Senior: {len(senior)/total*100:.1f}%")

🔄 Comparison: Python List vs R Vector vs Stata Variable

Creating Data

Operation	Python	R	Stata
Create	`ages = [25, 30, 35]`	`ages <- c(25, 30, 35)`	`gen age = ...`
Length	`len(ages)`	`length(ages)`	`count`
Access first	`ages[0]`	`ages[1]`	`age[1]`
Access last	`ages[-1]`	`ages[length(ages)]`	`age[_N]`

Statistical Operations

Operation	Python	R	Stata
Sum	`sum(ages)`	`sum(ages)`	`egen total = sum(age)`
Mean	`sum(ages)/len(ages)`	`mean(ages)`	`summarize age`
Max	`max(ages)`	`max(ages)`	`egen max_age = max(age)`
Sort	`sorted(ages)`	`sort(ages)`	`sort age`

⚠️ Common Errors

Error 1: Index Out of Range

python

students = ["Alice", "Bob", "Carol"]
print(students[3])  # ❌ IndexError: list index out of range
print(students[2])  # ✅ Carol (last is index 2)

Error 2: Confusing append() and extend()

python

numbers = [1, 2, 3]

numbers.append([4, 5])
print(numbers)  # [1, 2, 3, [4, 5]] (entire list as one element)

numbers = [1, 2, 3]
numbers.extend([4, 5])
print(numbers)  # [1, 2, 3, 4, 5] (added individually)

Error 3: Direct Assignment vs Copying

python

# ❌ Direct assignment (both variables point to same list)
list1 = [1, 2, 3]
list2 = list1
list2.append(4)
print(list1)  # [1, 2, 3, 4] (list1 changed too!)

# ✅ Copy list
list1 = [1, 2, 3]
list2 = list1.copy()  # or list2 = list1[:]
list2.append(4)
print(list1)  # [1, 2, 3] (list1 unchanged)
print(list2)  # [1, 2, 3, 4]

💪 Practice Problems

Exercise 1: GPA Calculation

python

# Student scores (0-100 scale)
scores = [85, 92, 78, 88, 95, 82, 90, 76, 89, 91]

# Tasks:
# 1. Convert to 4.0 GPA scale (90-100: 4.0, 80-89: 3.0, 70-79: 2.0, 60-69: 1.0)
# 2. Calculate average GPA
# 3. Count students in each grade bracket

Exercise 2: Data Filtering

python

# Respondent incomes
incomes = [45000, 75000, 120000, 35000, 95000, 60000, 150000, 50000]

# Tasks:
# 1. Filter middle-income earners (50000-100000)
# 2. Calculate average income of middle-income earners
# 3. Sort all incomes from low to high

Exercise 3: Survey ID Generation

python

# Task: Generate 100 survey IDs
# Format: Q001, Q002, Q003, ..., Q100
# Hint: Use list comprehension + string formatting

📚 Next Steps

In the next section, we'll learn about Tuples, the "immutable version" of lists, suitable for storing data that shouldn't be modified.

Keep learning!

Lists ​

What is a List? ​

Comparative Understanding ​

Creating Lists ​

1. Basic Creation ​

2. Using range() ​

3. Using List Comprehensions ​

Accessing List Elements ​

1. Index Access (starts at 0) ​

2. Slicing ​

✏️ Modifying Lists ​

1. Modifying Single Elements ​

2. Adding Elements ​

3. Removing Elements ​

📊 List Operations ​

1. Basic Operations ​

2. Finding Elements ​

🔬 Real-World Cases ​

Case 1: Survey Score Statistics ​

Case 2: Income Data Cleaning ​

Case 3: Grouped Statistics ​

🔄 Comparison: Python List vs R Vector vs Stata Variable ​

Creating Data ​

Statistical Operations ​

⚠️ Common Errors ​

Error 1: Index Out of Range ​

Error 2: Confusing append() and extend() ​

Error 3: Direct Assignment vs Copying ​

💪 Practice Problems ​

Exercise 1: GPA Calculation ​

Exercise 2: Data Filtering ​

Exercise 3: Survey ID Generation ​

📚 Next Steps ​

Lists

What is a List?

Comparative Understanding

Creating Lists

1. Basic Creation

2. Using range()

3. Using List Comprehensions

Accessing List Elements

1. Index Access (starts at 0)

2. Slicing

✏️ Modifying Lists

1. Modifying Single Elements

2. Adding Elements

3. Removing Elements

📊 List Operations

1. Basic Operations

2. Finding Elements

🔬 Real-World Cases

Case 1: Survey Score Statistics

Case 2: Income Data Cleaning

Case 3: Grouped Statistics

🔄 Comparison: Python List vs R Vector vs Stata Variable

Creating Data

Statistical Operations

⚠️ Common Errors

Error 1: Index Out of Range

Error 2: Confusing append() and extend()

Error 3: Direct Assignment vs Copying

💪 Practice Problems

Exercise 1: GPA Calculation

Exercise 2: Data Filtering

Exercise 3: Survey ID Generation

📚 Next Steps