11.3 Debugging and Profiling
Making Code Run Correctly and Quickly
Debugging Techniques
1. Print Debugging (Simplest)
python
def process_survey(data):
print(f"Debug: Input data type {type(data)}, length {len(data)}")
filtered = [x for x in data if x > 0]
print(f"Debug: After filtering {len(filtered)} items")
return sum(filtered) / len(filtered)2. Using assert
python
def calculate_mean(data):
assert len(data) > 0, "Data cannot be empty"
assert all(isinstance(x, (int, float)) for x in data), "Must be numeric"
return sum(data) / len(data)3. Logging
python
import logging
logging.basicConfig(level=logging.DEBUG)
def process_data(df):
logging.info(f"Starting to process {len(df)} rows")
df_clean = df.dropna()
logging.info(f"After cleaning {len(df_clean)} rows")
return df_cleanPerformance Optimization
1. Use Vectorization
python
import numpy as np
import time
# Slow: loop
start = time.time()
result = [x ** 2 for x in range(1000000)]
print(f"Loop: {time.time() - start:.4f} seconds")
# Fast: vectorization
start = time.time()
arr = np.arange(1000000)
result = arr ** 2
print(f"NumPy: {time.time() - start:.4f} seconds")2. Use Generators
python
# Memory intensive
squares = [x**2 for x in range(1000000)]
# Memory efficient
squares = (x**2 for x in range(1000000))3. Cache Results
python
from functools import lru_cache
@lru_cache(maxsize=None)
def fibonacci(n):
if n < 2:
return n
return fibonacci(n-1) + fibonacci(n-2)Performance Profiling
Use %timeit (Jupyter)
python
%timeit sum([x**2 for x in range(1000)])Use cProfile
python
import cProfile
cProfile.run('your_function()')Common Performance Issues
1. Avoid Repeated Calculations
python
# Slow
for i in range(len(df)):
if df.iloc[i]['income'] > df['income'].mean(): # Calculates mean every time
print(i)
# Fast
mean_income = df['income'].mean() # Calculate once
for i in range(len(df)):
if df.iloc[i]['income'] > mean_income:
print(i)2. Use Built-in Functions
python
# Slow
total = 0
for x in numbers:
total += x
# Fast
total = sum(numbers)Practical Checklist
Code Quality
- [ ] Clear variable naming
- [ ] Function documentation
- [ ] Comprehensive exception handling
- [ ] Standard code formatting
Performance
- [ ] Use vectorization
- [ ] Avoid repeated calculations
- [ ] Use appropriate data structures
Final Section
Next Section: Git Version Control Basics
Continue!