Skip to content

Module 5: Functions and Modules

Making Code Reusable — From Repetitive Tasks to Elegant Programming


Chapter Overview

When writing code, you'll often find many operations need to be executed repeatedly: calculating statistics, cleaning data, running regressions. Functions allow you to encapsulate these operations for reuse anytime, avoiding code duplication. Modules let you organize and reuse functions written by others (or yourself). Mastering functions and modules will elevate your code from "scripts" to "programs."


Learning Objectives

After completing this chapter, you will be able to:

  • ✅ Understand the concept and purpose of functions
  • ✅ Define and call your own functions
  • ✅ Master parameter passing (positional, keyword, default parameters)
  • ✅ Use *args and **kwargs to handle variable arguments
  • ✅ Understand Lambda functions and their use cases
  • ✅ Import and use standard libraries and third-party modules
  • ✅ Organize your own code into modules and packages
  • ✅ Compare Stata's program and R's function()

Chapter Contents

5.2 - Function Basics

Core Question: How to make code reusable?

Core Content:

  • Function definition and calling
  • Parameters and return values
  • Docstrings
  • Local vs. global variables
  • The meaning of None
  • Comparison with Stata (program define) and R (function())

Practical Application:

python
def calculate_bmi(weight, height):
    """Calculate BMI index

    Parameters:
        weight: Weight in kilograms
        height: Height in meters

    Returns:
        BMI value
    """
    bmi = weight / (height ** 2)
    return bmi

# Call the function
result = calculate_bmi(70, 1.75)
print(f"BMI: {result:.2f}")  # BMI: 22.86

Research Scenarios:

  • Calculate statistics (mean, standard deviation, quantiles)
  • Data cleaning (handling missing values, outlier detection)
  • Repeated analysis (multiple models, multiple subsamples)
  • Robustness checks

5.3 - Function Arguments

Core Question: How to flexibly pass data to functions?

Core Content:

  • Positional arguments: Pass by order
  • Keyword arguments: Pass by name
  • Default arguments: Provide default values
  • Variable positional arguments (*args): Accept any number of positional arguments
  • Variable keyword arguments (**kwargs): Accept any number of keyword arguments
  • Parameter order rules
  • Argument unpacking (*, **)

Practical Application:

python
def run_regression(y, X, model_type="OLS", robust=True, **options):
    """Run regression analysis

    Parameters:
        y: Dependent variable
        X: Independent variables
        model_type: Model type (default OLS)
        robust: Whether to use robust standard errors (default True)
        **options: Additional options
    """
    print(f"Model: {model_type}")
    print(f"Robust standard errors: {robust}")
    print(f"Other options: {options}")

# Flexible calling
run_regression(y, X)  # Use default parameters
run_regression(y, X, model_type="Logit")  # Change model type
run_regression(y, X, robust=False, cluster="firm_id")  # Pass extra options

Research Scenarios:

  • Provide default configurations (significance level, iteration count)
  • Flexible model options
  • Batch processing (pass multiple parameter combinations)

5.4 - Lambda Functions

Core Question: When to use anonymous functions?

Core Content:

  • Lambda syntax: lambda arguments: expression
  • Comparison with regular functions
  • Use cases:
    • Sorting (sorted(), list.sort())
    • Mapping (map())
    • Filtering (filter())
    • Pandas data operations
  • Limitations of Lambda

Practical Application:

python
# Sort student list by income
students = [
    {"name": "Alice", "income": 50000},
    {"name": "Bob", "income": 75000},
    {"name": "Carol", "income": 60000}
]

sorted_students = sorted(students, key=lambda x: x["income"])

# Pandas data cleaning
df["log_income"] = df["income"].apply(lambda x: np.log(x) if x > 0 else None)

# Filter high earners
high_earners = list(filter(lambda x: x["income"] > 60000, students))

When to Use Lambda?

  • ✅ Simple one-time functions
  • ✅ Sorting, mapping, filtering
  • ✅ Pandas apply() operations
  • ❌ Complex logic (use regular functions)
  • ❌ Need docstrings (use regular functions)

5.5 - Modules and Packages

Core Question: How to organize and reuse code?

Core Content:

  • Module: Single .py file
  • Package: Folder containing multiple modules
  • Import methods:
    • import module
    • from module import function
    • import module as alias
    • from module import * (not recommended)
  • Standard library introduction:
    • math: Mathematical functions
    • statistics: Statistical functions
    • random: Random number generation
    • datetime: Date and time handling
    • os, sys: System operations
  • Third-party library installation (pip install)
  • Creating your own modules and packages

Practical Application:

python
# Import standard library
import math
import statistics as stats
from datetime import datetime

# Use mathematical functions
log_income = math.log(50000)
sqrt_value = math.sqrt(100)

# Calculate statistics
mean_income = stats.mean([50000, 60000, 75000])
median_income = stats.median([50000, 60000, 75000])

# Import data analysis libraries
import numpy as np
import pandas as pd
import statsmodels.api as sm

# Create your own module
# my_stats.py
def winsorize(data, lower=0.01, upper=0.99):
    """Winsorization"""
    p_lower = np.quantile(data, lower)
    p_upper = np.quantile(data, upper)
    return np.clip(data, p_lower, p_upper)

# Import in other files
from my_stats import winsorize
clean_data = winsorize(incomes)

Research Scenarios:

  • Use NumPy for numerical computation
  • Use Pandas for data processing
  • Use statsmodels for regression
  • Organize your own data cleaning function library

5.6 - Summary and Review

Content:

  • Function design best practices
  • Common errors and debugging tips
  • Module organization recommendations
  • Comprehensive exercises
  • Stata/R/Python comparison

Stata vs R vs Python Function Comparison

Defining Functions

LanguageSyntax
Stataprogram define func_name
Rfunc_name <- function(args) { ... }
Pythondef func_name(args): ...

Example: Calculate BMI

Stata:

stata
program define calc_bmi
    args weight height
    gen bmi = `weight' / (`height'^2)
end

calc_bmi 70 1.75

R:

r
calc_bmi <- function(weight, height) {
  bmi <- weight / (height^2)
  return(bmi)
}

result <- calc_bmi(70, 1.75)

Python:

python
def calc_bmi(weight, height):
    bmi = weight / (height ** 2)
    return bmi

result = calc_bmi(70, 1.75)

How to Learn This Chapter?

Learning Path

Day 1 (3 hours): Function Basics

  • 📖 Read 5.2 - Function Basics
  • 💻 Define simple functions
  • 📝 Understand return values and docstrings

Day 2 (3 hours): Function Arguments

  • 📖 Read 5.3 - Function Arguments
  • 💻 Practice positional/keyword/default parameters
  • 📝 Understand *args and **kwargs

Day 3 (2 hours): Lambda Functions

  • 📖 Read 5.4 - Lambda Functions
  • 💻 Use Lambda for sorting and filtering
  • 📝 Use Lambda in Pandas

Day 4 (3 hours): Modules and Packages

  • 📖 Read 5.5 - Modules and Packages
  • 💻 Import and use standard libraries
  • 📝 Create your own modules

Day 5 (2 hours): Review and Comprehensive Application

  • 📖 Complete 5.6 - Summary and Review
  • 💻 Write a complete analysis function library
  • 📝 Organize code into modules

Total Time: 13 hours (1-2 weeks)

Minimal Learning Path

If time is limited:

Must Learn (Core concepts, 8 hours):

  • ✅ 5.2 - Function Basics (complete study)
  • ✅ 5.3 - Function Arguments (positional/keyword/default)
  • ✅ 5.5 - Modules and Packages (import standard libraries)

Optional (Advanced techniques):

  • 📌 *args and **kwargs
  • 📌 Lambda functions
  • 📌 Creating your own packages

Learning Recommendations

  1. Start from Requirements

    • Discover repeated code → Extract into function
    • Need configuration options → Use default parameters
    • Code getting too long → Split into modules
  2. Function Design Principles

    • Single Responsibility: One function does one thing
    • Meaningful Names: calculate_bmi() is better than func1()
    • Write Docstrings: Explain functionality, parameters, return values
    • Avoid Side Effects: Try not to modify global variables
  3. Practice Project Create a my_stats.py module containing:

    python
    # my_stats.py
    
    def mean(data):
        """Calculate mean"""
        return sum(data) / len(data)
    
    def std(data):
        """Calculate standard deviation"""
        m = mean(data)
        variance = sum((x - m) ** 2 for x in data) / len(data)
        return variance ** 0.5
    
    def winsorize(data, lower=0.01, upper=0.99):
        """Winsorization"""
        import numpy as np
        p_lower = np.quantile(data, lower)
        p_upper = np.quantile(data, upper)
        return np.clip(data, p_lower, p_upper)
  4. Comparative Learning

    • Reproduce Stata's program in Python
    • Reproduce R's function() in Python
    • Understand the philosophy behind syntax differences

Common Questions

Q: What's the difference between functions and variables? A: Variables store data, functions store operations. Variables are "nouns," functions are "verbs."

Q: When should I write a function? A: Follow the DRY principle (Don't Repeat Yourself). If a piece of code is used more than 3 times, it should be extracted into a function.

Q: Why do we need docstrings? A: In a few weeks, you'll forget how the function works. Docstrings are documentation for your future self (and collaborators).

Q: What's the difference between return and print()? A:

  • return: Returns value to caller, can be assigned to a variable
  • print(): Only displays on screen, doesn't return a value
python
def bad_function(x):
    print(x * 2)  # Only prints, doesn't return

def good_function(x):
    return x * 2  # Returns value

result = bad_function(5)  # result = None
result = good_function(5)  # result = 10

Q: When to use Lambda vs regular functions? A:

  • Lambda: Simple one-line expressions (sorting, mapping, filtering)
  • Regular function: Complex logic, multiple lines of code, needs docstring

Q: How to find useful third-party libraries? A:

  • Data analysis: numpy, pandas, statsmodels
  • Data visualization: matplotlib, seaborn, plotly
  • Machine learning: scikit-learn, xgboost, lightgbm
  • Text analysis: nltk, spaCy, transformers
  • Web scraping: requests, beautifulsoup4, scrapy

Next Steps

After completing this chapter, you will master:

  • ✅ Writing reusable functions
  • ✅ Flexibly using parameter passing
  • ✅ Importing and organizing modules
  • ✅ Elevating code from "scripts" to "programs"

In Module 6-7, we'll learn Pandas, the core library for Python data analysis, integrating all the concepts we've learned!

In Module 8-9, we'll learn data visualization and advanced data processing techniques.

Congratulations! After completing the first 5 modules, you've mastered Python's core syntax! Now it's time for practical data analysis!


Released under the MIT License. Content © Author.