Skip to content

11.3 Debugging and Profiling

Making Code Run Correctly and Quickly


Debugging Techniques

1. Print Debugging (Simplest)

python
def process_survey(data):
    print(f"Debug: Input data type {type(data)}, length {len(data)}")

    filtered = [x for x in data if x > 0]
    print(f"Debug: After filtering {len(filtered)} items")

    return sum(filtered) / len(filtered)

2. Using assert

python
def calculate_mean(data):
    assert len(data) > 0, "Data cannot be empty"
    assert all(isinstance(x, (int, float)) for x in data), "Must be numeric"

    return sum(data) / len(data)

3. Logging

python
import logging

logging.basicConfig(level=logging.DEBUG)

def process_data(df):
    logging.info(f"Starting to process {len(df)} rows")
    df_clean = df.dropna()
    logging.info(f"After cleaning {len(df_clean)} rows")
    return df_clean

Performance Optimization

1. Use Vectorization

python
import numpy as np
import time

# Slow: loop
start = time.time()
result = [x ** 2 for x in range(1000000)]
print(f"Loop: {time.time() - start:.4f} seconds")

# Fast: vectorization
start = time.time()
arr = np.arange(1000000)
result = arr ** 2
print(f"NumPy: {time.time() - start:.4f} seconds")

2. Use Generators

python
# Memory intensive
squares = [x**2 for x in range(1000000)]

# Memory efficient
squares = (x**2 for x in range(1000000))

3. Cache Results

python
from functools import lru_cache

@lru_cache(maxsize=None)
def fibonacci(n):
    if n < 2:
        return n
    return fibonacci(n-1) + fibonacci(n-2)

Performance Profiling

Use %timeit (Jupyter)

python
%timeit sum([x**2 for x in range(1000)])

Use cProfile

python
import cProfile

cProfile.run('your_function()')

Common Performance Issues

1. Avoid Repeated Calculations

python
# Slow
for i in range(len(df)):
    if df.iloc[i]['income'] > df['income'].mean():  # Calculates mean every time
        print(i)

# Fast
mean_income = df['income'].mean()  # Calculate once
for i in range(len(df)):
    if df.iloc[i]['income'] > mean_income:
        print(i)

2. Use Built-in Functions

python
# Slow
total = 0
for x in numbers:
    total += x

# Fast
total = sum(numbers)

Practical Checklist

Code Quality

  • [ ] Clear variable naming
  • [ ] Function documentation
  • [ ] Comprehensive exception handling
  • [ ] Standard code formatting

Performance

  • [ ] Use vectorization
  • [ ] Avoid repeated calculations
  • [ ] Use appropriate data structures

Final Section

Next Section: Git Version Control Basics

Continue!

Released under the MIT License. Content © Author.