Module 5 Summary

Mastering Code Reusability — From Basics to Best Practices

Key Concepts Review

1. Function Basics

Core Concepts:

Functions are reusable code blocks (DRY: Don't Repeat Yourself)
Functions encapsulate logic and improve code maintainability

Syntax:

python

def function_name(parameters):
    """Docstring (optional)"""
    # Function body
    return result

Return Values:

Single value: return x
Multiple values (tuple): return mean, max, min
Dictionary: return {'mean': mean, 'max': max}
No return value: return None (implicit)

2. Function Arguments

Parameter Type Summary

Parameter Type	Syntax	Order	Example
Positional	`func(a, b)`	Matters	`power(2, 3)`
Default	`func(a, b=10)`	Has default value	`greet(name, greeting="Hello")`
Keyword	`func(a=1, b=2)`	Doesn't matter	`func(age=30, name="Alice")`
Variable positional	`func(*args)`	Any number of args	`sum(*numbers)`
Variable keyword	`func(**kwargs)`	Any number of kwargs	`create(**info)`
Keyword-only	`func(*, kwonly)`	After `*` must use keyword	`func(income, *, tax_rate=0.25)`

Parameter Order:

python

def function(
    pos1, pos2,              # Positional parameters
    default_arg=10,          # Default parameter
    *args,                   # Variable positional parameters
    kwonly_arg,              # Keyword-only parameter
    **kwargs                 # Variable keyword parameters
):
    pass

⚠️ Warning:

python

# ❌ Don't use mutable objects as default parameters
def add_item(item, items=[]):
    items.append(item)
    return items

# ✅ Correct approach
def add_item(item, items=None):
    if items is None:
        items = []
    items.append(item)
    return items

3. Lambda Functions

Purpose:

Anonymous functions
Single-line expressions
Used for simple one-time operations

Syntax:

python

lambda parameters: expression

Lambda vs Regular Function

Feature	Lambda	Regular Function
Syntax	`lambda x: x**2`	`def f(x): return x**2`
Name	Anonymous	Named
Complexity	Single-line expression	Multi-line
Documentation	Cannot add	Can add docstring
Use Cases	Simple, one-time	Complex, reusable

4. Modules

Concepts:

Module: Single .py file
Package: Folder containing __init__.py
Library: Collection of related packages

Import Methods:

python

# 1. Import entire module
import math
math.sqrt(16)

# 2. Import with alias
import pandas as pd

# 3. Import specific functions
from math import sqrt, pi

# 4. Import all (NOT recommended)
from math import *  # ❌ Can cause name conflicts

Common Standard Libraries

Library	Purpose	Key Functions
`math`	Mathematical operations	`sqrt()`, `log()`, `exp()`, `pi`, `e`
`statistics`	Statistics	`mean()`, `median()`, `stdev()`, `variance()`
`random`	Random numbers	`randint()`, `random()`, `choice()`, `sample()`
`datetime`	Date and time	`datetime.now()`, `timedelta()`, `strftime()`
`json`	JSON data handling	`dumps()`, `loads()`, `dump()`, `load()`

Comparison: Python vs Stata vs R

Function Definition Syntax

Python:

python

def calculate_mean(data):
    return sum(data) / len(data)

Stata:

stata

program define calc_mean
    args varname
    summarize `varname'
    return scalar mean = r(mean)
end

calculate_mean <- function(data) {
  mean(data)
}

Package Management

Task	Python	Stata	R
Install	`pip install pandas`	`ssc install outreg2`	`install.packages("dplyr")`
Import	`import pandas as pd`	`which outreg2`	`library(dplyr)`
List installed	`pip list`	`ado dir`	`installed.packages()`

Common Pitfalls

1. Don't Use Mutable Objects as Default Parameters

python

# ❌ Wrong
def add_student(name, courses=[]):
    courses.append(name)
    return courses

# ✅ Correct
def add_student(name, courses=None):
    if courses is None:
        courses = []
    courses.append(name)
    return courses

2. Remember to Use return

python

# ❌ Wrong
def calculate_tax(income):
    tax = income * 0.25  # Missing return

# ✅ Correct
def calculate_tax(income):
    return income * 0.25

3. Parameter Order Matters

python

# ❌ Wrong
def register(name="Alice", age, major):  # SyntaxError
    pass

# ✅ Correct
def register(age, major, name="Alice"):
    pass

Best Practices

1. Function Naming

python

# ✅ Good naming (verb-based, descriptive)
def calculate_total(price, quantity)
def validate_email(email)
def is_adult(age)

# ❌ Bad naming
def f(x, y)
def data()

2. Single Responsibility

python

# ❌ Function does too many things
def process_data(data):
    # Clean + Analyze + Plot + Save
    pass

# ✅ Split into multiple functions
def clean_data(data):
    pass

def analyze_data(data):
    pass

3. Write Docstrings

python

def calculate_gini(incomes):
    """Calculate Gini coefficient

    Parameters:
        incomes (list): List of income values

    Returns:
        float: Gini coefficient (0-1)

    Example:
        >>> calculate_gini([10000, 20000, 30000])
        0.222
    """
    pass

Comprehensive Exercises

Exercise 1: Progressive Tax Calculator

Difficulty: ⭐⭐⭐ Time: 15 minutes

Write a function that calculates progressive tax:

python

def calculate_progressive_tax(income, brackets):
    """
    Calculate progressive tax

    Parameters:
        income: Total income
        brackets: Tax brackets [(limit, rate), ...]
    """
    pass

# Test
brackets = [(50000, 0.10), (100000, 0.20), (float('inf'), 0.30)]
print(calculate_progressive_tax(75000, brackets))  # 12500

💡 View Solution

python

def calculate_progressive_tax(income, brackets):
    """Calculate progressive tax"""
    tax = 0
    previous_limit = 0

    for limit, rate in brackets:
        if income <= previous_limit:
            break

        taxable = min(income, limit) - previous_limit
        tax += taxable * rate
        previous_limit = limit

    return tax

# Test
brackets = [(50000, 0.10), (100000, 0.20), (float('inf'), 0.30)]
assert calculate_progressive_tax(40000, brackets) == 4000
assert calculate_progressive_tax(75000, brackets) == 12500
assert calculate_progressive_tax(120000, brackets) == 26000
print("All tests passed!")

Exercise 2: Data Filter with Flexible Criteria

Difficulty: ⭐⭐⭐ Time: 20 minutes

python

def filter_respondents(data, **criteria):
    """
    Filter respondents by any criteria

    Possible criteria:
        min_age, max_age, gender, min_income, education, city
    """
    pass

# Test
respondents = [
    {'id': 1, 'age': 25, 'gender': 'F', 'income': 50000},
    {'id': 2, 'age': 35, 'gender': 'M', 'income': 80000},
]

result = filter_respondents(respondents, min_age=30, gender='M')

💡 View Solution

python

def filter_respondents(data, **criteria):
    """Filter respondents by any criteria"""
    filtered = []

    for person in data:
        match = True

        # Check age
        if 'min_age' in criteria and person.get('age', 0) < criteria['min_age']:
            match = False
        if 'max_age' in criteria and person.get('age', 999) > criteria['max_age']:
            match = False

        # Check income
        if 'min_income' in criteria and person.get('income', 0) < criteria['min_income']:
            match = False

        # Check exact matches
        for field in ['gender', 'education', 'city']:
            if field in criteria and person.get(field) != criteria[field]:
                match = False

        if match:
            filtered.append(person)

    return filtered

Exercise 3: Data Validation Module

Difficulty: ⭐⭐⭐ Time: 30 minutes

Create a module with validation functions:

python

def validate_age(age, min_age=18, max_age=100):
    """Validate age, return (is_valid, error_message)"""
    pass

def validate_income(income, min_income=0):
    """Validate income"""
    pass

def validate_email(email):
    """Validate email format"""
    pass

def validate_response(response, rules):
    """Validate entire response"""
    pass

💡 View Solution (Partial)

python

def validate_age(age, min_age=18, max_age=100):
    if not isinstance(age, (int, float)):
        return False, "Age must be a number"
    if age < min_age:
        return False, f"Age must be at least {min_age}"
    if age > max_age:
        return False, f"Age cannot exceed {max_age}"
    return True, ""

def validate_email(email):
    if not isinstance(email, str):
        return False, "Email must be a string"
    if '@' not in email or '.' not in email.split('@')[1]:
        return False, "Invalid email format"
    return True, ""

def validate_response(response, rules=None):
    errors = []

    if 'age' in response:
        if rules and 'age' in rules:
            min_age, max_age = rules['age']
            is_valid, error = validate_age(response['age'], min_age, max_age)
        else:
            is_valid, error = validate_age(response['age'])
        if not is_valid:
            errors.append(f"Age: {error}")

    if 'email' in response:
        is_valid, error = validate_email(response['email'])
        if not is_valid:
            errors.append(f"Email: {error}")

    return len(errors) == 0, errors

Exercise 4: Data Processing Pipeline

Difficulty: ⭐⭐⭐ Time: 30 minutes

python

def create_pipeline(*functions):
    """Create data processing pipeline"""
    pass

# Example
normalize = lambda x: x / 10000
discount = lambda x: x * 0.8
round_result = lambda x: round(x, 2)

pipeline = create_pipeline(normalize, discount, round_result)
result = pipeline(75000)  # 6.0

💡 View Solution

python

from functools import reduce

def create_pipeline(*functions):
    """Create data processing pipeline"""
    def pipeline(data):
        result = data
        for func in functions:
            result = func(result)
        return result
    return pipeline

# Alternative using reduce
def create_pipeline_v2(*functions):
    return lambda data: reduce(lambda x, f: f(x), functions, data)

# Test
normalize = lambda x: x / 10000
discount = lambda x: x * 0.8
round_result = lambda x: round(x, 2)

pipeline = create_pipeline(normalize, discount, round_result)
print(pipeline(75000))   # 6.0
print(pipeline(120000))  # 9.6

Exercise 5: Module Organization

Difficulty: ⭐⭐⭐ Time: 40 minutes

Create a module structure for survey analysis:

survey_project/
├── utils/
│   ├── __init__.py
│   ├── validation.py
│   └── stats.py
├── analysis/
│   ├── __init__.py
│   └── descriptive.py
└── main.py

💡 View Solution Structure

python

# utils/stats.py
def calculate_mean(values):
    return sum(values) / len(values) if values else 0

def calculate_median(values):
    sorted_values = sorted(values)
    n = len(sorted_values)
    if n % 2 == 0:
        return (sorted_values[n//2-1] + sorted_values[n//2]) / 2
    return sorted_values[n//2]

# analysis/descriptive.py
from utils.stats import calculate_mean, calculate_median

def describe_variable(data, variable):
    values = [record[variable] for record in data if variable in record]
    return {
        'count': len(values),
        'mean': calculate_mean(values),
        'median': calculate_median(values),
        'min': min(values) if values else 0,
        'max': max(values) if values else 0
    }

# main.py
from analysis.descriptive import describe_variable

data = [
    {'age': 25, 'income': 50000},
    {'age': 35, 'income': 80000},
]

stats = describe_variable(data, 'income')
print(stats)

Exercise 6: Income Inequality Analysis

Difficulty: ⭐⭐⭐⭐ Time: 45 minutes

Calculate income inequality measures:

python

def calculate_gini(incomes):
    """Calculate Gini coefficient"""
    pass

def calculate_quintiles(incomes):
    """Calculate income quintiles"""
    pass

def analyze_inequality(data):
    """Comprehensive income inequality analysis"""
    pass

💡 View Solution (Gini Coefficient)

python

def calculate_gini(incomes):
    """Calculate Gini coefficient"""
    valid_incomes = [inc for inc in incomes if inc > 0]
    if len(valid_incomes) <= 1:
        return 0.0

    sorted_incomes = sorted(valid_incomes)
    n = len(sorted_incomes)

    # Formula: G = (2 * Σ(i * x_i)) / (n * Σx_i) - (n+1)/n
    numerator = sum((i + 1) * income for i, income in enumerate(sorted_incomes))
    denominator = n * sum(sorted_incomes)

    gini = (2 * numerator) / denominator - (n + 1) / n
    return round(gini, 4)

# Test
incomes = [30000, 50000, 75000, 120000, 200000]
print(f"Gini coefficient: {calculate_gini(incomes)}")  # ~0.3

Exercise 7: Recursive Functions

Difficulty: ⭐⭐⭐⭐ Time: 35 minutes

Implement recursive algorithms:

python

organization = {
    'name': 'CEO',
    'salary': 500000,
    'subordinates': [
        {'name': 'VP', 'salary': 300000, 'subordinates': []}
    ]
}

def count_employees(org):
    """Recursively count employees"""
    pass

def calculate_total_salary(org):
    """Recursively calculate total salary"""
    pass

💡 View Solution

python

def count_employees(org):
    """Recursively count employees"""
    return 1 + sum(count_employees(sub) for sub in org.get('subordinates', []))

def calculate_total_salary(org):
    """Recursively calculate total salary"""
    return org['salary'] + sum(
        calculate_total_salary(sub) for sub in org.get('subordinates', [])
    )

def get_max_depth(org, current_depth=1):
    """Recursively calculate organizational depth"""
    subordinates = org.get('subordinates', [])
    if not subordinates:
        return current_depth
    return max(get_max_depth(sub, current_depth + 1) for sub in subordinates)

# Test
organization = {
    'name': 'CEO',
    'salary': 500000,
    'subordinates': [
        {
            'name': 'VP',
            'salary': 300000,
            'subordinates': [
                {'name': 'Manager', 'salary': 150000, 'subordinates': []}
            ]
        }
    ]
}

print(f"Employees: {count_employees(organization)}")           # 3
print(f"Total salary: ${calculate_total_salary(organization):,}") # $950,000
print(f"Depth: {get_max_depth(organization)}")           # 3

Key Takeaways

You've now mastered:

✅ Function definition and calling
✅ All parameter types
✅ Lambda functions and functional programming
✅ Module and package management

You've completed Module 5! 🎉

In Module 6, we'll dive into Object-Oriented Programming (OOP).

Additional Resources

Ready to tackle OOP? Let's go! 🚀

Module 5 Summary ​

Key Concepts Review ​

1. Function Basics ​

2. Function Arguments ​

3. Lambda Functions ​

4. Modules ​

Comparison: Python vs Stata vs R ​

Function Definition Syntax ​

Package Management ​

Common Pitfalls ​

1. Don't Use Mutable Objects as Default Parameters ​

2. Remember to Use return ​

3. Parameter Order Matters ​

Best Practices ​

1. Function Naming ​

2. Single Responsibility ​

3. Write Docstrings ​

Comprehensive Exercises ​

Exercise 1: Progressive Tax Calculator ​

Exercise 2: Data Filter with Flexible Criteria ​

Exercise 3: Data Validation Module ​

Exercise 4: Data Processing Pipeline ​

Exercise 5: Module Organization ​

Exercise 6: Income Inequality Analysis ​

Exercise 7: Recursive Functions ​

Key Takeaways ​

Additional Resources ​

Module 5 Summary

Key Concepts Review

1. Function Basics

2. Function Arguments

3. Lambda Functions

4. Modules

Comparison: Python vs Stata vs R

Function Definition Syntax

Package Management

Common Pitfalls

1. Don't Use Mutable Objects as Default Parameters

2. Remember to Use return

3. Parameter Order Matters

Best Practices

1. Function Naming

2. Single Responsibility

3. Write Docstrings

Comprehensive Exercises

Exercise 1: Progressive Tax Calculator

Exercise 2: Data Filter with Flexible Criteria

Exercise 3: Data Validation Module

Exercise 4: Data Processing Pipeline

Exercise 5: Module Organization

Exercise 6: Income Inequality Analysis

Exercise 7: Recursive Functions

Key Takeaways

Additional Resources