Module 5 Summary
Mastering Code Reusability — From Basics to Best Practices
Key Concepts Review
1. Function Basics
Core Concepts:
- Functions are reusable code blocks (DRY: Don't Repeat Yourself)
- Functions encapsulate logic and improve code maintainability
Syntax:
def function_name(parameters):
"""Docstring (optional)"""
# Function body
return resultReturn Values:
- Single value:
return x - Multiple values (tuple):
return mean, max, min - Dictionary:
return {'mean': mean, 'max': max} - No return value:
return None(implicit)
2. Function Arguments
Parameter Type Summary
| Parameter Type | Syntax | Order | Example |
|---|---|---|---|
| Positional | func(a, b) | Matters | power(2, 3) |
| Default | func(a, b=10) | Has default value | greet(name, greeting="Hello") |
| Keyword | func(a=1, b=2) | Doesn't matter | func(age=30, name="Alice") |
| Variable positional | func(*args) | Any number of args | sum(*numbers) |
| Variable keyword | func(**kwargs) | Any number of kwargs | create(**info) |
| Keyword-only | func(*, kwonly) | After * must use keyword | func(income, *, tax_rate=0.25) |
Parameter Order:
def function(
pos1, pos2, # Positional parameters
default_arg=10, # Default parameter
*args, # Variable positional parameters
kwonly_arg, # Keyword-only parameter
**kwargs # Variable keyword parameters
):
pass⚠️ Warning:
# ❌ Don't use mutable objects as default parameters
def add_item(item, items=[]):
items.append(item)
return items
# ✅ Correct approach
def add_item(item, items=None):
if items is None:
items = []
items.append(item)
return items3. Lambda Functions
Purpose:
- Anonymous functions
- Single-line expressions
- Used for simple one-time operations
Syntax:
lambda parameters: expressionLambda vs Regular Function
| Feature | Lambda | Regular Function |
|---|---|---|
| Syntax | lambda x: x**2 | def f(x): return x**2 |
| Name | Anonymous | Named |
| Complexity | Single-line expression | Multi-line |
| Documentation | Cannot add | Can add docstring |
| Use Cases | Simple, one-time | Complex, reusable |
4. Modules
Concepts:
- Module: Single
.pyfile - Package: Folder containing
__init__.py - Library: Collection of related packages
Import Methods:
# 1. Import entire module
import math
math.sqrt(16)
# 2. Import with alias
import pandas as pd
# 3. Import specific functions
from math import sqrt, pi
# 4. Import all (NOT recommended)
from math import * # ❌ Can cause name conflictsCommon Standard Libraries
| Library | Purpose | Key Functions |
|---|---|---|
math | Mathematical operations | sqrt(), log(), exp(), pi, e |
statistics | Statistics | mean(), median(), stdev(), variance() |
random | Random numbers | randint(), random(), choice(), sample() |
datetime | Date and time | datetime.now(), timedelta(), strftime() |
json | JSON data handling | dumps(), loads(), dump(), load() |
Comparison: Python vs Stata vs R
Function Definition Syntax
Python:
def calculate_mean(data):
return sum(data) / len(data)Stata:
program define calc_mean
args varname
summarize `varname'
return scalar mean = r(mean)
endR:
calculate_mean <- function(data) {
mean(data)
}Package Management
| Task | Python | Stata | R |
|---|---|---|---|
| Install | pip install pandas | ssc install outreg2 | install.packages("dplyr") |
| Import | import pandas as pd | which outreg2 | library(dplyr) |
| List installed | pip list | ado dir | installed.packages() |
Common Pitfalls
1. Don't Use Mutable Objects as Default Parameters
# ❌ Wrong
def add_student(name, courses=[]):
courses.append(name)
return courses
# ✅ Correct
def add_student(name, courses=None):
if courses is None:
courses = []
courses.append(name)
return courses2. Remember to Use return
# ❌ Wrong
def calculate_tax(income):
tax = income * 0.25 # Missing return
# ✅ Correct
def calculate_tax(income):
return income * 0.253. Parameter Order Matters
# ❌ Wrong
def register(name="Alice", age, major): # SyntaxError
pass
# ✅ Correct
def register(age, major, name="Alice"):
passBest Practices
1. Function Naming
# ✅ Good naming (verb-based, descriptive)
def calculate_total(price, quantity)
def validate_email(email)
def is_adult(age)
# ❌ Bad naming
def f(x, y)
def data()2. Single Responsibility
# ❌ Function does too many things
def process_data(data):
# Clean + Analyze + Plot + Save
pass
# ✅ Split into multiple functions
def clean_data(data):
pass
def analyze_data(data):
pass3. Write Docstrings
def calculate_gini(incomes):
"""Calculate Gini coefficient
Parameters:
incomes (list): List of income values
Returns:
float: Gini coefficient (0-1)
Example:
>>> calculate_gini([10000, 20000, 30000])
0.222
"""
passComprehensive Exercises
Exercise 1: Progressive Tax Calculator
Difficulty: ⭐⭐⭐ Time: 15 minutes
Write a function that calculates progressive tax:
def calculate_progressive_tax(income, brackets):
"""
Calculate progressive tax
Parameters:
income: Total income
brackets: Tax brackets [(limit, rate), ...]
"""
pass
# Test
brackets = [(50000, 0.10), (100000, 0.20), (float('inf'), 0.30)]
print(calculate_progressive_tax(75000, brackets)) # 12500💡 View Solution
def calculate_progressive_tax(income, brackets):
"""Calculate progressive tax"""
tax = 0
previous_limit = 0
for limit, rate in brackets:
if income <= previous_limit:
break
taxable = min(income, limit) - previous_limit
tax += taxable * rate
previous_limit = limit
return tax
# Test
brackets = [(50000, 0.10), (100000, 0.20), (float('inf'), 0.30)]
assert calculate_progressive_tax(40000, brackets) == 4000
assert calculate_progressive_tax(75000, brackets) == 12500
assert calculate_progressive_tax(120000, brackets) == 26000
print("All tests passed!")Exercise 2: Data Filter with Flexible Criteria
Difficulty: ⭐⭐⭐ Time: 20 minutes
def filter_respondents(data, **criteria):
"""
Filter respondents by any criteria
Possible criteria:
min_age, max_age, gender, min_income, education, city
"""
pass
# Test
respondents = [
{'id': 1, 'age': 25, 'gender': 'F', 'income': 50000},
{'id': 2, 'age': 35, 'gender': 'M', 'income': 80000},
]
result = filter_respondents(respondents, min_age=30, gender='M')💡 View Solution
def filter_respondents(data, **criteria):
"""Filter respondents by any criteria"""
filtered = []
for person in data:
match = True
# Check age
if 'min_age' in criteria and person.get('age', 0) < criteria['min_age']:
match = False
if 'max_age' in criteria and person.get('age', 999) > criteria['max_age']:
match = False
# Check income
if 'min_income' in criteria and person.get('income', 0) < criteria['min_income']:
match = False
# Check exact matches
for field in ['gender', 'education', 'city']:
if field in criteria and person.get(field) != criteria[field]:
match = False
if match:
filtered.append(person)
return filteredExercise 3: Data Validation Module
Difficulty: ⭐⭐⭐ Time: 30 minutes
Create a module with validation functions:
def validate_age(age, min_age=18, max_age=100):
"""Validate age, return (is_valid, error_message)"""
pass
def validate_income(income, min_income=0):
"""Validate income"""
pass
def validate_email(email):
"""Validate email format"""
pass
def validate_response(response, rules):
"""Validate entire response"""
pass💡 View Solution (Partial)
def validate_age(age, min_age=18, max_age=100):
if not isinstance(age, (int, float)):
return False, "Age must be a number"
if age < min_age:
return False, f"Age must be at least {min_age}"
if age > max_age:
return False, f"Age cannot exceed {max_age}"
return True, ""
def validate_email(email):
if not isinstance(email, str):
return False, "Email must be a string"
if '@' not in email or '.' not in email.split('@')[1]:
return False, "Invalid email format"
return True, ""
def validate_response(response, rules=None):
errors = []
if 'age' in response:
if rules and 'age' in rules:
min_age, max_age = rules['age']
is_valid, error = validate_age(response['age'], min_age, max_age)
else:
is_valid, error = validate_age(response['age'])
if not is_valid:
errors.append(f"Age: {error}")
if 'email' in response:
is_valid, error = validate_email(response['email'])
if not is_valid:
errors.append(f"Email: {error}")
return len(errors) == 0, errorsExercise 4: Data Processing Pipeline
Difficulty: ⭐⭐⭐ Time: 30 minutes
def create_pipeline(*functions):
"""Create data processing pipeline"""
pass
# Example
normalize = lambda x: x / 10000
discount = lambda x: x * 0.8
round_result = lambda x: round(x, 2)
pipeline = create_pipeline(normalize, discount, round_result)
result = pipeline(75000) # 6.0💡 View Solution
from functools import reduce
def create_pipeline(*functions):
"""Create data processing pipeline"""
def pipeline(data):
result = data
for func in functions:
result = func(result)
return result
return pipeline
# Alternative using reduce
def create_pipeline_v2(*functions):
return lambda data: reduce(lambda x, f: f(x), functions, data)
# Test
normalize = lambda x: x / 10000
discount = lambda x: x * 0.8
round_result = lambda x: round(x, 2)
pipeline = create_pipeline(normalize, discount, round_result)
print(pipeline(75000)) # 6.0
print(pipeline(120000)) # 9.6Exercise 5: Module Organization
Difficulty: ⭐⭐⭐ Time: 40 minutes
Create a module structure for survey analysis:
survey_project/
├── utils/
│ ├── __init__.py
│ ├── validation.py
│ └── stats.py
├── analysis/
│ ├── __init__.py
│ └── descriptive.py
└── main.py💡 View Solution Structure
# utils/stats.py
def calculate_mean(values):
return sum(values) / len(values) if values else 0
def calculate_median(values):
sorted_values = sorted(values)
n = len(sorted_values)
if n % 2 == 0:
return (sorted_values[n//2-1] + sorted_values[n//2]) / 2
return sorted_values[n//2]
# analysis/descriptive.py
from utils.stats import calculate_mean, calculate_median
def describe_variable(data, variable):
values = [record[variable] for record in data if variable in record]
return {
'count': len(values),
'mean': calculate_mean(values),
'median': calculate_median(values),
'min': min(values) if values else 0,
'max': max(values) if values else 0
}
# main.py
from analysis.descriptive import describe_variable
data = [
{'age': 25, 'income': 50000},
{'age': 35, 'income': 80000},
]
stats = describe_variable(data, 'income')
print(stats)Exercise 6: Income Inequality Analysis
Difficulty: ⭐⭐⭐⭐ Time: 45 minutes
Calculate income inequality measures:
def calculate_gini(incomes):
"""Calculate Gini coefficient"""
pass
def calculate_quintiles(incomes):
"""Calculate income quintiles"""
pass
def analyze_inequality(data):
"""Comprehensive income inequality analysis"""
pass💡 View Solution (Gini Coefficient)
def calculate_gini(incomes):
"""Calculate Gini coefficient"""
valid_incomes = [inc for inc in incomes if inc > 0]
if len(valid_incomes) <= 1:
return 0.0
sorted_incomes = sorted(valid_incomes)
n = len(sorted_incomes)
# Formula: G = (2 * Σ(i * x_i)) / (n * Σx_i) - (n+1)/n
numerator = sum((i + 1) * income for i, income in enumerate(sorted_incomes))
denominator = n * sum(sorted_incomes)
gini = (2 * numerator) / denominator - (n + 1) / n
return round(gini, 4)
# Test
incomes = [30000, 50000, 75000, 120000, 200000]
print(f"Gini coefficient: {calculate_gini(incomes)}") # ~0.3Exercise 7: Recursive Functions
Difficulty: ⭐⭐⭐⭐ Time: 35 minutes
Implement recursive algorithms:
organization = {
'name': 'CEO',
'salary': 500000,
'subordinates': [
{'name': 'VP', 'salary': 300000, 'subordinates': []}
]
}
def count_employees(org):
"""Recursively count employees"""
pass
def calculate_total_salary(org):
"""Recursively calculate total salary"""
pass💡 View Solution
def count_employees(org):
"""Recursively count employees"""
return 1 + sum(count_employees(sub) for sub in org.get('subordinates', []))
def calculate_total_salary(org):
"""Recursively calculate total salary"""
return org['salary'] + sum(
calculate_total_salary(sub) for sub in org.get('subordinates', [])
)
def get_max_depth(org, current_depth=1):
"""Recursively calculate organizational depth"""
subordinates = org.get('subordinates', [])
if not subordinates:
return current_depth
return max(get_max_depth(sub, current_depth + 1) for sub in subordinates)
# Test
organization = {
'name': 'CEO',
'salary': 500000,
'subordinates': [
{
'name': 'VP',
'salary': 300000,
'subordinates': [
{'name': 'Manager', 'salary': 150000, 'subordinates': []}
]
}
]
}
print(f"Employees: {count_employees(organization)}") # 3
print(f"Total salary: ${calculate_total_salary(organization):,}") # $950,000
print(f"Depth: {get_max_depth(organization)}") # 3Key Takeaways
You've now mastered:
- ✅ Function definition and calling
- ✅ All parameter types
- ✅ Lambda functions and functional programming
- ✅ Module and package management
You've completed Module 5! 🎉
In Module 6, we'll dive into Object-Oriented Programming (OOP).
Additional Resources
Ready to tackle OOP? Let's go! 🚀