Skip to content

Lists

Python's most commonly used data structure — understand the Python version of R vectors and Stata variables


What is a List?

A list is Python's most fundamental data structure, used to store ordered, mutable collections of elements.

Comparative Understanding

ConceptPythonRStata
Ordered collectionlistvectorVariable
Example[1, 2, 3]c(1, 2, 3)gen x = ...

Key Characteristics:

  • Ordered: Elements have a fixed order
  • Mutable: Can modify, add, delete elements
  • Allows duplicates: Same value can appear multiple times
  • Mixed types: Can contain different data types (but not recommended)

Creating Lists

1. Basic Creation

python
# Empty list
empty_list = []

# Integer list
ages = [25, 30, 35, 40, 45]

# String list
names = ["Alice", "Bob", "Carol", "David"]

# Mixed types (not recommended, but possible)
mixed = [25, "Alice", 3.14, True]

print(ages)   # [25, 30, 35, 40, 45]
print(names)  # ['Alice', 'Bob', 'Carol', 'David']

2. Using range()

python
# Generate 0 to 9
numbers = list(range(10))
print(numbers)  # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

# Generate 1 to 10
numbers = list(range(1, 11))
print(numbers)  # [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# Generate even numbers
evens = list(range(0, 20, 2))
print(evens)  # [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

3. Using List Comprehensions

python
# Generate square numbers
squares = [x**2 for x in range(1, 6)]
print(squares)  # [1, 4, 9, 16, 25]

# Generate even numbers
evens = [x for x in range(20) if x % 2 == 0]
print(evens)  # [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

Accessing List Elements

1. Index Access (starts at 0)

python
students = ["Alice", "Bob", "Carol", "David", "Emma"]

# Forward indexing (starts at 0)
print(students[0])   # Alice (1st)
print(students[1])   # Bob (2nd)
print(students[4])   # Emma (5th)

# Backward indexing (starts at -1)
print(students[-1])  # Emma (last)
print(students[-2])  # David (2nd from end)

⚠️ Note: Python indexing starts at 0, while R and Stata start at 1!

LanguageFirst elementLast element
Pythonlist[0]list[-1]
Rvector[1]vector[length(vector)]
Statavar[1]var[_N]

2. Slicing

python
numbers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

# Basic slicing [start:end] (end not included)
print(numbers[2:5])    # [2, 3, 4]
print(numbers[0:3])    # [0, 1, 2]
print(numbers[5:])     # [5, 6, 7, 8, 9] (from index 5 to end)
print(numbers[:5])     # [0, 1, 2, 3, 4] (from start to index 5)
print(numbers[:])      # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] (all)

# Slicing with step [start:end:step]
print(numbers[::2])    # [0, 2, 4, 6, 8] (every other)
print(numbers[1::2])   # [1, 3, 5, 7, 9] (odd numbers)
print(numbers[::-1])   # [9, 8, 7, 6, 5, 4, 3, 2, 1, 0] (reversed)

Social Science Application Example:

python
# Respondent age data
ages = [25, 28, 30, 35, 40, 45, 50, 55, 60, 65]

# Get first 5 respondents
first_five = ages[:5]
print(first_five)  # [25, 28, 30, 35, 40]

# Get last 5 respondents
last_five = ages[-5:]
print(last_five)  # [45, 50, 55, 60, 65]

# Get middle-aged group (index 2 to 7)
middle_age = ages[2:7]
print(middle_age)  # [30, 35, 40, 45, 50]

✏️ Modifying Lists

1. Modifying Single Elements

python
scores = [85, 90, 78, 92, 88]

# Modify first score
scores[0] = 95
print(scores)  # [95, 90, 78, 92, 88]

# Modify last score
scores[-1] = 90
print(scores)  # [95, 90, 78, 92, 90]

2. Adding Elements

python
students = ["Alice", "Bob"]

# append(): Add one element at the end
students.append("Carol")
print(students)  # ['Alice', 'Bob', 'Carol']

# insert(): Insert at specified position
students.insert(1, "David")  # Insert at index 1
print(students)  # ['Alice', 'David', 'Bob', 'Carol']

# extend(): Add multiple elements
students.extend(["Emma", "Frank"])
print(students)  # ['Alice', 'David', 'Bob', 'Carol', 'Emma', 'Frank']

# + operator: Combine lists
new_students = students + ["Grace", "Henry"]
print(new_students)  # ['Alice', 'David', 'Bob', 'Carol', 'Emma', 'Frank', 'Grace', 'Henry']

3. Removing Elements

python
numbers = [1, 2, 3, 4, 5, 3]

# remove(): Remove first matching value
numbers.remove(3)
print(numbers)  # [1, 2, 4, 5, 3] (only removed first 3)

# pop(): Remove element at specified index (default: last)
last = numbers.pop()
print(last)      # 3
print(numbers)   # [1, 2, 4, 5]

first = numbers.pop(0)
print(first)     # 1
print(numbers)   # [2, 4, 5]

# del: Delete specified index or slice
del numbers[1]
print(numbers)   # [2, 5]

# clear(): Empty the list
numbers.clear()
print(numbers)   # []

📊 List Operations

1. Basic Operations

python
numbers = [3, 1, 4, 1, 5, 9, 2, 6]

# Length
print(len(numbers))  # 8

# Max/min/sum
print(max(numbers))  # 9
print(min(numbers))  # 1
print(sum(numbers))  # 31

# Average (need to calculate manually)
average = sum(numbers) / len(numbers)
print(average)  # 3.875

# Sorting (modifies original list)
numbers.sort()
print(numbers)  # [1, 1, 2, 3, 4, 5, 6, 9]

# Reverse
numbers.reverse()
print(numbers)  # [9, 6, 5, 4, 3, 2, 1, 1]

# sorted(): Returns new list (doesn't modify original)
numbers = [3, 1, 4, 1, 5, 9, 2, 6]
sorted_numbers = sorted(numbers)
print(sorted_numbers)  # [1, 1, 2, 3, 4, 5, 6, 9]
print(numbers)         # [3, 1, 4, 1, 5, 9, 2, 6] (original unchanged)

2. Finding Elements

python
majors = ["Economics", "Sociology", "Economics", "Political Science"]

# count(): Count occurrences
print(majors.count("Economics"))  # 2

# index(): Find first occurrence index
print(majors.index("Sociology"))  # 1

# in: Check if exists
print("Economics" in majors)      # True
print("Physics" in majors)        # False

🔬 Real-World Cases

Case 1: Survey Score Statistics

python
# Student scores
scores = [85, 92, 78, 90, 88, 76, 95, 82, 89, 91]

# Descriptive statistics
print("=== Score Statistics ===")
print(f"Sample size: {len(scores)}")
print(f"Highest score: {max(scores)}")
print(f"Lowest score: {min(scores)}")
print(f"Total: {sum(scores)}")
print(f"Average: {sum(scores)/len(scores):.2f}")

# Passing rate
passing = [s for s in scores if s >= 60]
passing_rate = len(passing) / len(scores) * 100
print(f"Passing rate: {passing_rate:.1f}%")

# Grade distribution
excellent = len([s for s in scores if s >= 90])
good = len([s for s in scores if 80 <= s < 90])
fair = len([s for s in scores if 70 <= s < 80])
poor = len([s for s in scores if s < 70])

print(f"\n=== Grade Distribution ===")
print(f"Excellent (90+): {excellent} students")
print(f"Good (80-89): {good} students")
print(f"Fair (70-79): {fair} students")
print(f"Pass (60-69): {poor} students")

Case 2: Income Data Cleaning

python
# Raw income data (contains outliers)
raw_incomes = [50000, 65000, -5000, 80000, 1000000, 55000, 70000, 0]

# Data cleaning
print("=== Income Data Cleaning ===")
print(f"Original count: {len(raw_incomes)}")

# Filter valid data (positive and not exceeding 500k)
clean_incomes = []
for income in raw_incomes:
    if 0 < income <= 500000:
        clean_incomes.append(income)

print(f"After cleaning: {len(clean_incomes)}")
print(f"Removed: {len(raw_incomes) - len(clean_incomes)}")

# More concise (list comprehension)
clean_incomes = [inc for inc in raw_incomes if 0 < inc <= 500000]

# Statistics
print(f"\n=== Statistics After Cleaning ===")
print(f"Average income: ${sum(clean_incomes)/len(clean_incomes):,.0f}")
print(f"Median: ${sorted(clean_incomes)[len(clean_incomes)//2]:,.0f}")
print(f"Highest income: ${max(clean_incomes):,.0f}")
print(f"Lowest income: ${min(clean_incomes):,.0f}")

Case 3: Grouped Statistics

python
# Respondent ages
ages = [22, 25, 28, 30, 35, 38, 42, 45, 50, 55, 60, 65, 28, 32, 40]

# Age grouping
youth = [age for age in ages if age < 30]
middle = [age for age in ages if 30 <= age < 50]
senior = [age for age in ages if age >= 50]

print("=== Age Group Statistics ===")
print(f"Youth (<30): {len(youth)} people, avg {sum(youth)/len(youth):.1f} years")
print(f"Middle (30-49): {len(middle)} people, avg {sum(middle)/len(middle):.1f} years")
print(f"Senior (50+): {len(senior)} people, avg {sum(senior)/len(senior):.1f} years")

# Group percentages
total = len(ages)
print(f"\n=== Age Distribution ===")
print(f"Youth: {len(youth)/total*100:.1f}%")
print(f"Middle: {len(middle)/total*100:.1f}%")
print(f"Senior: {len(senior)/total*100:.1f}%")

🔄 Comparison: Python List vs R Vector vs Stata Variable

Creating Data

OperationPythonRStata
Createages = [25, 30, 35]ages <- c(25, 30, 35)gen age = ...
Lengthlen(ages)length(ages)count
Access firstages[0]ages[1]age[1]
Access lastages[-1]ages[length(ages)]age[_N]

Statistical Operations

OperationPythonRStata
Sumsum(ages)sum(ages)egen total = sum(age)
Meansum(ages)/len(ages)mean(ages)summarize age
Maxmax(ages)max(ages)egen max_age = max(age)
Sortsorted(ages)sort(ages)sort age

⚠️ Common Errors

Error 1: Index Out of Range

python
students = ["Alice", "Bob", "Carol"]
print(students[3])  # ❌ IndexError: list index out of range
print(students[2])  # ✅ Carol (last is index 2)

Error 2: Confusing append() and extend()

python
numbers = [1, 2, 3]

numbers.append([4, 5])
print(numbers)  # [1, 2, 3, [4, 5]] (entire list as one element)

numbers = [1, 2, 3]
numbers.extend([4, 5])
print(numbers)  # [1, 2, 3, 4, 5] (added individually)

Error 3: Direct Assignment vs Copying

python
# ❌ Direct assignment (both variables point to same list)
list1 = [1, 2, 3]
list2 = list1
list2.append(4)
print(list1)  # [1, 2, 3, 4] (list1 changed too!)

# ✅ Copy list
list1 = [1, 2, 3]
list2 = list1.copy()  # or list2 = list1[:]
list2.append(4)
print(list1)  # [1, 2, 3] (list1 unchanged)
print(list2)  # [1, 2, 3, 4]

💪 Practice Problems

Exercise 1: GPA Calculation

python
# Student scores (0-100 scale)
scores = [85, 92, 78, 88, 95, 82, 90, 76, 89, 91]

# Tasks:
# 1. Convert to 4.0 GPA scale (90-100: 4.0, 80-89: 3.0, 70-79: 2.0, 60-69: 1.0)
# 2. Calculate average GPA
# 3. Count students in each grade bracket

Exercise 2: Data Filtering

python
# Respondent incomes
incomes = [45000, 75000, 120000, 35000, 95000, 60000, 150000, 50000]

# Tasks:
# 1. Filter middle-income earners (50000-100000)
# 2. Calculate average income of middle-income earners
# 3. Sort all incomes from low to high

Exercise 3: Survey ID Generation

python
# Task: Generate 100 survey IDs
# Format: Q001, Q002, Q003, ..., Q100
# Hint: Use list comprehension + string formatting

📚 Next Steps

In the next section, we'll learn about Tuples, the "immutable version" of lists, suitable for storing data that shouldn't be modified.

Keep learning!

Released under the MIT License. Content © Author.