Skip to content

Module 5 Summary

Mastering Code Reusability — From Basics to Best Practices


Key Concepts Review

1. Function Basics

Core Concepts:

  • Functions are reusable code blocks (DRY: Don't Repeat Yourself)
  • Functions encapsulate logic and improve code maintainability

Syntax:

python
def function_name(parameters):
    """Docstring (optional)"""
    # Function body
    return result

Return Values:

  • Single value: return x
  • Multiple values (tuple): return mean, max, min
  • Dictionary: return {'mean': mean, 'max': max}
  • No return value: return None (implicit)

2. Function Arguments

Parameter Type Summary

Parameter TypeSyntaxOrderExample
Positionalfunc(a, b)Matterspower(2, 3)
Defaultfunc(a, b=10)Has default valuegreet(name, greeting="Hello")
Keywordfunc(a=1, b=2)Doesn't matterfunc(age=30, name="Alice")
Variable positionalfunc(*args)Any number of argssum(*numbers)
Variable keywordfunc(**kwargs)Any number of kwargscreate(**info)
Keyword-onlyfunc(*, kwonly)After * must use keywordfunc(income, *, tax_rate=0.25)

Parameter Order:

python
def function(
    pos1, pos2,              # Positional parameters
    default_arg=10,          # Default parameter
    *args,                   # Variable positional parameters
    kwonly_arg,              # Keyword-only parameter
    **kwargs                 # Variable keyword parameters
):
    pass

⚠️ Warning:

python
# ❌ Don't use mutable objects as default parameters
def add_item(item, items=[]):
    items.append(item)
    return items

# ✅ Correct approach
def add_item(item, items=None):
    if items is None:
        items = []
    items.append(item)
    return items

3. Lambda Functions

Purpose:

  • Anonymous functions
  • Single-line expressions
  • Used for simple one-time operations

Syntax:

python
lambda parameters: expression

Lambda vs Regular Function

FeatureLambdaRegular Function
Syntaxlambda x: x**2def f(x): return x**2
NameAnonymousNamed
ComplexitySingle-line expressionMulti-line
DocumentationCannot addCan add docstring
Use CasesSimple, one-timeComplex, reusable

4. Modules

Concepts:

  • Module: Single .py file
  • Package: Folder containing __init__.py
  • Library: Collection of related packages

Import Methods:

python
# 1. Import entire module
import math
math.sqrt(16)

# 2. Import with alias
import pandas as pd

# 3. Import specific functions
from math import sqrt, pi

# 4. Import all (NOT recommended)
from math import *  # ❌ Can cause name conflicts

Common Standard Libraries

LibraryPurposeKey Functions
mathMathematical operationssqrt(), log(), exp(), pi, e
statisticsStatisticsmean(), median(), stdev(), variance()
randomRandom numbersrandint(), random(), choice(), sample()
datetimeDate and timedatetime.now(), timedelta(), strftime()
jsonJSON data handlingdumps(), loads(), dump(), load()

Comparison: Python vs Stata vs R

Function Definition Syntax

Python:

python
def calculate_mean(data):
    return sum(data) / len(data)

Stata:

stata
program define calc_mean
    args varname
    summarize `varname'
    return scalar mean = r(mean)
end

R:

r
calculate_mean <- function(data) {
  mean(data)
}

Package Management

TaskPythonStataR
Installpip install pandasssc install outreg2install.packages("dplyr")
Importimport pandas as pdwhich outreg2library(dplyr)
List installedpip listado dirinstalled.packages()

Common Pitfalls

1. Don't Use Mutable Objects as Default Parameters

python
# ❌ Wrong
def add_student(name, courses=[]):
    courses.append(name)
    return courses

# ✅ Correct
def add_student(name, courses=None):
    if courses is None:
        courses = []
    courses.append(name)
    return courses

2. Remember to Use return

python
# ❌ Wrong
def calculate_tax(income):
    tax = income * 0.25  # Missing return

# ✅ Correct
def calculate_tax(income):
    return income * 0.25

3. Parameter Order Matters

python
# ❌ Wrong
def register(name="Alice", age, major):  # SyntaxError
    pass

# ✅ Correct
def register(age, major, name="Alice"):
    pass

Best Practices

1. Function Naming

python
# ✅ Good naming (verb-based, descriptive)
def calculate_total(price, quantity)
def validate_email(email)
def is_adult(age)

# ❌ Bad naming
def f(x, y)
def data()

2. Single Responsibility

python
# ❌ Function does too many things
def process_data(data):
    # Clean + Analyze + Plot + Save
    pass

# ✅ Split into multiple functions
def clean_data(data):
    pass

def analyze_data(data):
    pass

3. Write Docstrings

python
def calculate_gini(incomes):
    """Calculate Gini coefficient

    Parameters:
        incomes (list): List of income values

    Returns:
        float: Gini coefficient (0-1)

    Example:
        >>> calculate_gini([10000, 20000, 30000])
        0.222
    """
    pass

Comprehensive Exercises

Exercise 1: Progressive Tax Calculator

Difficulty: ⭐⭐⭐ Time: 15 minutes

Write a function that calculates progressive tax:

python
def calculate_progressive_tax(income, brackets):
    """
    Calculate progressive tax

    Parameters:
        income: Total income
        brackets: Tax brackets [(limit, rate), ...]
    """
    pass

# Test
brackets = [(50000, 0.10), (100000, 0.20), (float('inf'), 0.30)]
print(calculate_progressive_tax(75000, brackets))  # 12500
💡 View Solution
python
def calculate_progressive_tax(income, brackets):
    """Calculate progressive tax"""
    tax = 0
    previous_limit = 0

    for limit, rate in brackets:
        if income <= previous_limit:
            break

        taxable = min(income, limit) - previous_limit
        tax += taxable * rate
        previous_limit = limit

    return tax

# Test
brackets = [(50000, 0.10), (100000, 0.20), (float('inf'), 0.30)]
assert calculate_progressive_tax(40000, brackets) == 4000
assert calculate_progressive_tax(75000, brackets) == 12500
assert calculate_progressive_tax(120000, brackets) == 26000
print("All tests passed!")

Exercise 2: Data Filter with Flexible Criteria

Difficulty: ⭐⭐⭐ Time: 20 minutes

python
def filter_respondents(data, **criteria):
    """
    Filter respondents by any criteria

    Possible criteria:
        min_age, max_age, gender, min_income, education, city
    """
    pass

# Test
respondents = [
    {'id': 1, 'age': 25, 'gender': 'F', 'income': 50000},
    {'id': 2, 'age': 35, 'gender': 'M', 'income': 80000},
]

result = filter_respondents(respondents, min_age=30, gender='M')
💡 View Solution
python
def filter_respondents(data, **criteria):
    """Filter respondents by any criteria"""
    filtered = []

    for person in data:
        match = True

        # Check age
        if 'min_age' in criteria and person.get('age', 0) < criteria['min_age']:
            match = False
        if 'max_age' in criteria and person.get('age', 999) > criteria['max_age']:
            match = False

        # Check income
        if 'min_income' in criteria and person.get('income', 0) < criteria['min_income']:
            match = False

        # Check exact matches
        for field in ['gender', 'education', 'city']:
            if field in criteria and person.get(field) != criteria[field]:
                match = False

        if match:
            filtered.append(person)

    return filtered

Exercise 3: Data Validation Module

Difficulty: ⭐⭐⭐ Time: 30 minutes

Create a module with validation functions:

python
def validate_age(age, min_age=18, max_age=100):
    """Validate age, return (is_valid, error_message)"""
    pass

def validate_income(income, min_income=0):
    """Validate income"""
    pass

def validate_email(email):
    """Validate email format"""
    pass

def validate_response(response, rules):
    """Validate entire response"""
    pass
💡 View Solution (Partial)
python
def validate_age(age, min_age=18, max_age=100):
    if not isinstance(age, (int, float)):
        return False, "Age must be a number"
    if age < min_age:
        return False, f"Age must be at least {min_age}"
    if age > max_age:
        return False, f"Age cannot exceed {max_age}"
    return True, ""

def validate_email(email):
    if not isinstance(email, str):
        return False, "Email must be a string"
    if '@' not in email or '.' not in email.split('@')[1]:
        return False, "Invalid email format"
    return True, ""

def validate_response(response, rules=None):
    errors = []

    if 'age' in response:
        if rules and 'age' in rules:
            min_age, max_age = rules['age']
            is_valid, error = validate_age(response['age'], min_age, max_age)
        else:
            is_valid, error = validate_age(response['age'])
        if not is_valid:
            errors.append(f"Age: {error}")

    if 'email' in response:
        is_valid, error = validate_email(response['email'])
        if not is_valid:
            errors.append(f"Email: {error}")

    return len(errors) == 0, errors

Exercise 4: Data Processing Pipeline

Difficulty: ⭐⭐⭐ Time: 30 minutes

python
def create_pipeline(*functions):
    """Create data processing pipeline"""
    pass

# Example
normalize = lambda x: x / 10000
discount = lambda x: x * 0.8
round_result = lambda x: round(x, 2)

pipeline = create_pipeline(normalize, discount, round_result)
result = pipeline(75000)  # 6.0
💡 View Solution
python
from functools import reduce

def create_pipeline(*functions):
    """Create data processing pipeline"""
    def pipeline(data):
        result = data
        for func in functions:
            result = func(result)
        return result
    return pipeline

# Alternative using reduce
def create_pipeline_v2(*functions):
    return lambda data: reduce(lambda x, f: f(x), functions, data)

# Test
normalize = lambda x: x / 10000
discount = lambda x: x * 0.8
round_result = lambda x: round(x, 2)

pipeline = create_pipeline(normalize, discount, round_result)
print(pipeline(75000))   # 6.0
print(pipeline(120000))  # 9.6

Exercise 5: Module Organization

Difficulty: ⭐⭐⭐ Time: 40 minutes

Create a module structure for survey analysis:

survey_project/
├── utils/
│   ├── __init__.py
│   ├── validation.py
│   └── stats.py
├── analysis/
│   ├── __init__.py
│   └── descriptive.py
└── main.py
💡 View Solution Structure
python
# utils/stats.py
def calculate_mean(values):
    return sum(values) / len(values) if values else 0

def calculate_median(values):
    sorted_values = sorted(values)
    n = len(sorted_values)
    if n % 2 == 0:
        return (sorted_values[n//2-1] + sorted_values[n//2]) / 2
    return sorted_values[n//2]

# analysis/descriptive.py
from utils.stats import calculate_mean, calculate_median

def describe_variable(data, variable):
    values = [record[variable] for record in data if variable in record]
    return {
        'count': len(values),
        'mean': calculate_mean(values),
        'median': calculate_median(values),
        'min': min(values) if values else 0,
        'max': max(values) if values else 0
    }

# main.py
from analysis.descriptive import describe_variable

data = [
    {'age': 25, 'income': 50000},
    {'age': 35, 'income': 80000},
]

stats = describe_variable(data, 'income')
print(stats)

Exercise 6: Income Inequality Analysis

Difficulty: ⭐⭐⭐⭐ Time: 45 minutes

Calculate income inequality measures:

python
def calculate_gini(incomes):
    """Calculate Gini coefficient"""
    pass

def calculate_quintiles(incomes):
    """Calculate income quintiles"""
    pass

def analyze_inequality(data):
    """Comprehensive income inequality analysis"""
    pass
💡 View Solution (Gini Coefficient)
python
def calculate_gini(incomes):
    """Calculate Gini coefficient"""
    valid_incomes = [inc for inc in incomes if inc > 0]
    if len(valid_incomes) <= 1:
        return 0.0

    sorted_incomes = sorted(valid_incomes)
    n = len(sorted_incomes)

    # Formula: G = (2 * Σ(i * x_i)) / (n * Σx_i) - (n+1)/n
    numerator = sum((i + 1) * income for i, income in enumerate(sorted_incomes))
    denominator = n * sum(sorted_incomes)

    gini = (2 * numerator) / denominator - (n + 1) / n
    return round(gini, 4)

# Test
incomes = [30000, 50000, 75000, 120000, 200000]
print(f"Gini coefficient: {calculate_gini(incomes)}")  # ~0.3

Exercise 7: Recursive Functions

Difficulty: ⭐⭐⭐⭐ Time: 35 minutes

Implement recursive algorithms:

python
organization = {
    'name': 'CEO',
    'salary': 500000,
    'subordinates': [
        {'name': 'VP', 'salary': 300000, 'subordinates': []}
    ]
}

def count_employees(org):
    """Recursively count employees"""
    pass

def calculate_total_salary(org):
    """Recursively calculate total salary"""
    pass
💡 View Solution
python
def count_employees(org):
    """Recursively count employees"""
    return 1 + sum(count_employees(sub) for sub in org.get('subordinates', []))

def calculate_total_salary(org):
    """Recursively calculate total salary"""
    return org['salary'] + sum(
        calculate_total_salary(sub) for sub in org.get('subordinates', [])
    )

def get_max_depth(org, current_depth=1):
    """Recursively calculate organizational depth"""
    subordinates = org.get('subordinates', [])
    if not subordinates:
        return current_depth
    return max(get_max_depth(sub, current_depth + 1) for sub in subordinates)

# Test
organization = {
    'name': 'CEO',
    'salary': 500000,
    'subordinates': [
        {
            'name': 'VP',
            'salary': 300000,
            'subordinates': [
                {'name': 'Manager', 'salary': 150000, 'subordinates': []}
            ]
        }
    ]
}

print(f"Employees: {count_employees(organization)}")           # 3
print(f"Total salary: ${calculate_total_salary(organization):,}") # $950,000
print(f"Depth: {get_max_depth(organization)}")           # 3

Key Takeaways

You've now mastered:

  • ✅ Function definition and calling
  • ✅ All parameter types
  • ✅ Lambda functions and functional programming
  • ✅ Module and package management

You've completed Module 5! 🎉

In Module 6, we'll dive into Object-Oriented Programming (OOP).


Additional Resources

Ready to tackle OOP? Let's go! 🚀

Released under the MIT License. Content © Author.