Sets

Unordered and unique element collections — the tool for deduplication and set operations

What is a Set?

A set is a Python data structure for storing unique elements in an unordered manner, similar to the mathematical concept of sets.

Key Characteristics:

Unordered: No indexing, cannot use set[0]
Unique: Automatically removes duplicate elements
Mutable: Can add/remove elements
Fast lookup: Checking if element exists is very fast (O(1))

Main Uses:

Deduplication
Membership checking (whether exists)
Set operations (intersection, union, difference)

Creating Sets

python

# Empty set (note: cannot use {}, that's an empty dictionary)
empty_set = set()

# Basic set
majors = {"Economics", "Sociology", "Political Science"}

# From list (auto-deduplication)
ages = [25, 30, 25, 35, 30, 40]
unique_ages = set(ages)
print(unique_ages)  # {25, 30, 35, 40} (order may vary)

# From string (split into characters)
letters = set("hello")
print(letters)  # {'h', 'e', 'l', 'o'} (auto-deduplication)

✏️ Basic Operations

1. Adding Elements

python

majors = {"Economics", "Sociology"}

# add(): Add single element
majors.add("Political Science")
print(majors)  # {'Economics', 'Sociology', 'Political Science'}

# Adding duplicate element (no effect)
majors.add("Economics")
print(majors)  # Still 3 elements (auto-deduplication)

# update(): Add multiple elements
majors.update(["Psychology", "Anthropology"])
print(majors)  # 5 elements

2. Removing Elements

python

majors = {"Economics", "Sociology", "Political Science"}

# remove(): Remove element (raises error if doesn't exist)
majors.remove("Sociology")
print(majors)

# discard(): Remove element (no error if doesn't exist)
majors.discard("Physics")  # No error even if doesn't exist

# pop(): Randomly remove one element
removed = majors.pop()
print(f"Removed: {removed}")

# clear(): Empty the set
majors.clear()
print(majors)  # set()

3. Membership Checking

python

majors = {"Economics", "Sociology", "Political Science"}

# Check if element exists
print("Economics" in majors)  # True
print("Physics" in majors)    # False

# Count and iteration
print(len(majors))  # 3

for major in majors:
    print(major)

🔄 Set Operations

1. Union

python

# Respondent IDs from two surveys
survey1 = {101, 102, 103, 104}
survey2 = {103, 104, 105, 106}

# Union: all participants
all_respondents = survey1 | survey2
# or
all_respondents = survey1.union(survey2)

print(all_respondents)  # {101, 102, 103, 104, 105, 106}

2. Intersection

python

# Intersection: participated in both surveys
both_surveys = survey1 & survey2
# or
both_surveys = survey1.intersection(survey2)

print(both_surveys)  # {103, 104}

3. Difference

python

# Difference: only participated in first survey
only_first = survey1 - survey2
# or
only_first = survey1.difference(survey2)

print(only_first)  # {101, 102}

# Reverse difference
only_second = survey2 - survey1
print(only_second)  # {105, 106}

4. Symmetric Difference

python

# Symmetric difference: participated in only one survey (not both)
only_one_survey = survey1 ^ survey2
# or
only_one_survey = survey1.symmetric_difference(survey2)

print(only_one_survey)  # {101, 102, 105, 106}

Set Operations Summary:

Operation	Symbol	Method	Meaning
Union	`A	B`	`A.union(B)`
Intersection	`A & B`	`A.intersection(B)`	Elements in both A and B
Difference	`A - B`	`A.difference(B)`	Elements in A but not B
Symmetric Difference	`A ^ B`	`A.symmetric_difference(B)`	Elements in A or B, but not both

🔬 Real-World Cases

Case 1: Data Deduplication

python

# Respondent IDs (with duplicates)
respondent_ids = [1001, 1002, 1001, 1003, 1002, 1004, 1003]

# Deduplicate
unique_ids = set(respondent_ids)
print(f"Original count: {len(respondent_ids)}")
print(f"After deduplication: {len(unique_ids)}")
print(f"Duplicates removed: {len(respondent_ids) - len(unique_ids)}")

# Convert back to list
unique_ids_list = sorted(list(unique_ids))
print(unique_ids_list)  # [1001, 1002, 1003, 1004]

Case 2: Finding New Respondents

python

# First wave respondents
wave1 = {1001, 1002, 1003, 1004, 1005}

# Second wave respondents
wave2 = {1003, 1004, 1005, 1006, 1007, 1008}

# Analysis
print("=== Survey Analysis ===")
print(f"Wave 1: {len(wave1)} people")
print(f"Wave 2: {len(wave2)} people")
print(f"Both waves: {len(wave1 & wave2)} people")
print(f"New respondents: {len(wave2 - wave1)} people → {wave2 - wave1}")
print(f"Lost respondents: {len(wave1 - wave2)} people → {wave1 - wave2}")
print(f"Total coverage: {len(wave1 | wave2)} people")

Case 3: Survey Quality Check

python

# Required fields
required_fields = {"id", "age", "gender", "income"}

# Respondent 1 data
respondent1 = {"id", "age", "gender", "income", "education"}
respondent2 = {"id", "age", "gender"}  # Missing income

# Check completeness
print("=== Respondent 1 ===")
missing1 = required_fields - respondent1
if missing1:
    print(f"❌ Missing fields: {missing1}")
else:
    print("✅ Data complete")

print("\n=== Respondent 2 ===")
missing2 = required_fields - respondent2
if missing2:
    print(f"❌ Missing fields: {missing2}")
else:
    print("✅ Data complete")

Case 4: Course Enrollment Analysis

python

# Students enrolled in different courses
econ_students = {"Alice", "Bob", "Carol", "David", "Emma"}
stat_students = {"Bob", "Carol", "Frank", "Grace"}
python_students = {"Alice", "Carol", "Emma", "Frank", "Henry"}

# Analysis
print("=== Course Enrollment Analysis ===")

# Students taking all three courses
all_three = econ_students & stat_students & python_students
print(f"All three courses: {all_three}")

# Students taking at least one course
at_least_one = econ_students | stat_students | python_students
print(f"At least one course: {len(at_least_one)} students")

# Students taking only economics
only_econ = econ_students - stat_students - python_students
print(f"Only economics: {only_econ}")

# Students taking economics or statistics but not Python
econ_or_stat_not_python = (econ_students | stat_students) - python_students
print(f"Econ/Stat but not Python: {econ_or_stat_not_python}")

🚀 Advanced Techniques

1. Frozen Sets (frozenset)

Immutable sets, can be used as dictionary keys or set elements.

python

# Regular sets cannot be nested
# s = {{1, 2}, {3, 4}}  # ❌ TypeError

# frozenset can
s = {frozenset({1, 2}), frozenset({3, 4})}
print(s)  # {frozenset({1, 2}), frozenset({3, 4})}

# As dictionary keys
survey_participants = {
    frozenset({1001, 1002}): "Group 1",
    frozenset({1003, 1004}): "Group 2"
}

2. Set Comprehensions

python

# Generate unique squares from list
numbers = [1, 2, 2, 3, 3, 3, 4]
squares = {x**2 for x in numbers}
print(squares)  # {1, 4, 9, 16}

# Filter even number squares
even_squares = {x**2 for x in range(10) if x % 2 == 0}
print(even_squares)  # {0, 4, 16, 36, 64}

3. Subset and Superset Testing

python

# Define sets
social_science = {"Economics", "Sociology", "Political Science"}
all_majors = {"Economics", "Sociology", "Political Science", "Physics", "Math"}

# Test subset
print(social_science.issubset(all_majors))  # True
print(social_science <= all_majors)         # True (equivalent)

# Test superset
print(all_majors.issuperset(social_science))  # True
print(all_majors >= social_science)           # True (equivalent)

# Test disjoint
physics = {"Physics", "Chemistry"}
print(social_science.isdisjoint(physics))  # True (no intersection)

🤔 When to Use Sets?

Scenario	Use List	Use Set
Preserve order	✅	❌
Allow duplicates	✅	❌
Fast lookup	❌	✅
Deduplication	❌	✅
Set operations	❌	✅
Access by index	✅	❌

Example:

python

# ❌ Using list for lookup (slow)
students = ["Alice", "Bob", "Carol", ...1000 students...]
if "Alice" in students:  # Need to traverse, O(n)
    print("Found")

# ✅ Using set for lookup (fast)
students = {"Alice", "Bob", "Carol", ...1000 students...}
if "Alice" in students:  # Hash lookup, O(1)
    print("Found")

⚠️ Common Errors

Error 1: Trying to Use Indexing

python

majors = {"Economics", "Sociology"}
print(majors[0])  # ❌ TypeError: 'set' object is not subscriptable

Error 2: Confusing Empty Set and Empty Dictionary

python

empty = {}         # ❌ This is empty dictionary
empty_set = set()  # ✅ This is empty set

print(type(empty))      # <class 'dict'>
print(type(empty_set))  # <class 'set'>

Error 3: Adding Mutable Objects

python

# ❌ Lists cannot be added to sets
# s = {[1, 2], [3, 4]}  # TypeError

# ✅ Tuples can
s = {(1, 2), (3, 4)}

💪 Practice Problems

Exercise 1: Deduplicate and Sort

python

# Respondent ages (with duplicates)
ages = [25, 30, 25, 35, 30, 40, 25, 28, 30, 35]

# Tasks:
# 1. Deduplicate
# 2. Sort from low to high
# 3. Output unique ages and count

Exercise 2: Survey Completeness Check

python

# Required fields
required_fields = {"id", "age", "gender", "income", "education"}

# Batch check
responses = [
    {"id", "age", "gender", "income", "education"},  # Complete
    {"id", "age", "gender", "income"},                # Missing education
    {"id", "age", "gender"},                          # Missing income, education
]

# Task: Check each response for completeness, output missing fields

Exercise 3: Common Friends

python

# Alice's friends
alice_friends = {"Bob", "Carol", "David", "Emma"}

# Bob's friends
bob_friends = {"Alice", "Carol", "Frank", "Grace"}

# Tasks:
# 1. Find common friends of Alice and Bob
# 2. Find people who are only Alice's friends
# 3. Find total number of friends (no duplicates)

📝 Summary

You've now mastered Python's four data structures:

Data Structure	Ordered	Mutable	Duplicates	Use
List	✓	✓	✓	General sequences
Tuple	✓	✗	✓	Immutable data
Dict	*	✓	Keys unique	Key-value pairs
Set	✗	✓	✗	Deduplication, set operations

*Python 3.7+ dictionaries maintain insertion order

Next Step: We'll learn about Functions and Modules, making code more modular and reusable.

Ready? Keep going!

Sets ​

What is a Set? ​

Creating Sets ​

✏️ Basic Operations ​

1. Adding Elements ​

2. Removing Elements ​

3. Membership Checking ​

🔄 Set Operations ​

1. Union ​

2. Intersection ​

3. Difference ​

4. Symmetric Difference ​

🔬 Real-World Cases ​

Case 1: Data Deduplication ​

Case 2: Finding New Respondents ​

Case 3: Survey Quality Check ​

Case 4: Course Enrollment Analysis ​

🚀 Advanced Techniques ​

1. Frozen Sets (frozenset) ​

2. Set Comprehensions ​

3. Subset and Superset Testing ​

🤔 When to Use Sets? ​

⚠️ Common Errors ​

Error 1: Trying to Use Indexing ​

Error 2: Confusing Empty Set and Empty Dictionary ​

Error 3: Adding Mutable Objects ​

💪 Practice Problems ​

Exercise 1: Deduplicate and Sort ​

Exercise 2: Survey Completeness Check ​

Exercise 3: Common Friends ​

📝 Summary ​

Sets

What is a Set?

Creating Sets

✏️ Basic Operations

1. Adding Elements

2. Removing Elements

3. Membership Checking

🔄 Set Operations

1. Union

2. Intersection

3. Difference

4. Symmetric Difference

🔬 Real-World Cases

Case 1: Data Deduplication

Case 2: Finding New Respondents

Case 3: Survey Quality Check

Case 4: Course Enrollment Analysis

🚀 Advanced Techniques

1. Frozen Sets (frozenset)

2. Set Comprehensions

3. Subset and Superset Testing

🤔 When to Use Sets?

⚠️ Common Errors

Error 1: Trying to Use Indexing

Error 2: Confusing Empty Set and Empty Dictionary

Error 3: Adding Mutable Objects

💪 Practice Problems

Exercise 1: Deduplicate and Sort

Exercise 2: Survey Completeness Check

Exercise 3: Common Friends

📝 Summary