Module 8 Summary and Review
Exception Handling and Debugging — Making Code More Robust
Knowledge Point Summary
1. Error Types
Syntax Errors vs Runtime Exceptions:
- Syntax Error: Code written incorrectly, cannot run
- Exception: Errors that occur during program execution
Common Runtime Exceptions:
| Exception Type | Reason | Example |
|---|---|---|
NameError | Variable undefined | print(undefined_var) |
TypeError | Type error | "5" + 5 |
ValueError | Value error | int("abc") |
IndexError | Index out of range | list[999] |
KeyError | Dictionary key doesn't exist | dict['missing_key'] |
FileNotFoundError | File doesn't exist | open('missing.txt') |
ZeroDivisionError | Division by zero | 10 / 0 |
AttributeError | Attribute doesn't exist | list.push() |
2. try-except Exception Handling
Basic Syntax:
python
try:
# Code that might raise an error
risky_operation()
except ExceptionType:
# Handle the error
handle_error()Complete Syntax:
python
try:
# Try to execute
result = operation()
except ValueError:
# Handle specific exception
print("Value error")
except FileNotFoundError:
# Handle another exception
print("File not found")
except Exception as e:
# Catch all other exceptions
print(f"Other error: {e}")
else:
# Execute if no exception occurs
print(f"Success: {result}")
finally:
# Always execute (cleanup resources)
cleanup()Multiple Exception Catching:
python
# Method 1: Catch separately
try:
operation()
except ValueError:
handle_value_error()
except TypeError:
handle_type_error()
# Method 2: Catch together
try:
operation()
except (ValueError, TypeError) as e:
handle_error(e)3. Debugging Techniques
Debugging Levels:
- Print Debugging: Simplest, quick to locate issues
- Assertions (Assert): Verify assumptions
- Logging: Persist debugging information
- Debugger (pdb): Interactive debugging
Print Debugging:
python
def calculate_tax(income, rate):
print(f"Debug: income={income}, rate={rate}")
tax = income * rate
print(f"Debug: tax={tax}")
return taxAssertions:
python
def calculate_mean(data):
assert len(data) > 0, "Data cannot be empty"
assert all(isinstance(x, (int, float)) for x in data), "All values must be numbers"
return sum(data) / len(data)Logging:
python
import logging
logging.basicConfig(level=logging.DEBUG,
format='%(asctime)s - %(levelname)s - %(message)s')
logging.debug("Detailed information")
logging.info("General information")
logging.warning("Warning")
logging.error("Error")
logging.critical("Critical error")Common Mistakes
1. Catching Too Broad Exceptions
python
# Wrong: Catch all exceptions
try:
result = operation()
except:
print("Error occurred")
# Correct: Catch specific exceptions
try:
result = operation()
except ValueError:
print("Value error")
except FileNotFoundError:
print("File not found")2. Ignoring Exception Information
python
# Wrong: Don't save exception information
try:
operation()
except Exception:
print("Error occurred")
# Correct: Save exception information
try:
operation()
except Exception as e:
print(f"Error occurred: {e}")
logging.error(f"Detailed error: {e}", exc_info=True)3. Overusing try-except
python
# Wrong: Use exception instead of normal logic
try:
value = dict['key']
except KeyError:
value = None
# Correct: Use conditional logic
value = dict.get('key', None)Best Practices
1. Catch Specific Exceptions
python
# Good habit
try:
df = pd.read_csv('data.csv')
except FileNotFoundError:
print("File not found")
except pd.errors.EmptyDataError:
print("File is empty")
except Exception as e:
print(f"Other error: {e}")2. Use else Clause
python
try:
df = pd.read_csv('data.csv')
except FileNotFoundError:
print("File not found")
else:
# Only execute on success
print(f"Successfully loaded {len(df)} rows")3. finally for Resource Cleanup
python
# Method 1: Using finally
file = None
try:
file = open('data.txt', 'r')
content = file.read()
except FileNotFoundError:
print("File not found")
finally:
if file:
file.close()
# Method 2: Using with (better)
with open('data.txt', 'r') as file:
content = file.read()4. Log Complete Error Information
python
import logging
import traceback
try:
operation()
except Exception as e:
logging.error(f"Error: {e}")
logging.error(traceback.format_exc())Programming Exercises
Exercise 1: Safe Data Reading Function (Basic)
Difficulty: ⭐⭐ Time: 15 minutes
python
"""
Task: Create a safe CSV reading function
Requirements:
1. Check if file exists
2. Handle empty file case
3. Verify required columns exist
4. Return DataFrame or None
"""
import pandas as pd
from pathlib import Path
def safe_read_csv(filename, required_columns=None):
"""Safely read a CSV file"""
# Your code
pass
# Test
df = safe_read_csv('survey.csv', required_columns=['id', 'age', 'income'])
if df is not None:
print(f"Success: {len(df)} rows")Reference Answer
python
import pandas as pd
from pathlib import Path
def safe_read_csv(filename, required_columns=None):
"""Safely read a CSV file
Parameters:
filename: Path to CSV file
required_columns: List of required column names
Returns:
DataFrame or None
"""
try:
# Check if file exists
file_path = Path(filename)
if not file_path.exists():
print(f" File not found: {filename}")
return None
# Read CSV
df = pd.read_csv(file_path)
# Check if empty
if len(df) == 0:
print(f"File is empty: {filename}")
return None
# Validate required columns
if required_columns:
missing_columns = set(required_columns) - set(df.columns)
if missing_columns:
print(f" Missing columns: {missing_columns}")
print(f" Available columns: {list(df.columns)}")
return None
print(f" Successfully loaded: {len(df)} rows × {len(df.columns)} columns")
return df
except pd.errors.EmptyDataError:
print(f" File corrupted or empty: {filename}")
return None
except pd.errors.ParserError as e:
print(f" Parse error: {e}")
return None
except Exception as e:
print(f" Unknown error: {e}")
return None
# Test
if __name__ == "__main__":
# Create test file
test_data = pd.DataFrame({
'id': [1, 2, 3],
'age': [25, 30, 35],
'income': [50000, 75000, 85000]
})
test_data.to_csv('test_survey.csv', index=False)
# Test normal reading
print("Test 1: Normal reading")
df = safe_read_csv('test_survey.csv', required_columns=['id', 'age', 'income'])
# Test file not found
print("\nTest 2: File not found")
df = safe_read_csv('missing.csv')
# Test missing columns
print("\nTest 3: Missing required columns")
df = safe_read_csv('test_survey.csv', required_columns=['id', 'age', 'education'])Exercise 2: Batch Data Cleaning (Basic)
Difficulty: ⭐⭐ Time: 20 minutes
python
"""
Task: Clean survey data and handle various anomalies
Data issues:
- age may be string or invalid value
- income may be string or negative
- Fields may be missing
"""
responses = [
{'id': 1, 'age': '25', 'income': '50000'},
{'id': 2, 'age': 'N/A', 'income': '75000'},
{'id': 3, 'age': '35', 'income': 'unknown'},
{'id': 4, 'age': '40', 'income': '85000'},
{'id': 5, 'age': '150', 'income': '-1000'},
]
def clean_responses(responses):
"""Clean response data"""
# Your code
passReference Answer
python
def clean_responses(responses):
"""Clean response data
Parameters:
responses: List of response dictionaries
Returns:
(valid_data, errors): List of valid data and error list
"""
valid_data = []
errors = []
for resp in responses:
try:
resp_id = resp.get('id', 'unknown')
# Validate and convert age
age_str = resp.get('age')
if age_str is None:
raise ValueError("Missing age field")
age = int(age_str)
if not (0 < age < 120):
raise ValueError(f"Age out of range: {age}")
# Validate and convert income
income_str = resp.get('income')
if income_str is None:
raise ValueError("Missing income field")
income = float(income_str)
if income < 0:
raise ValueError(f"Income is negative: {income}")
# Create cleaned data
clean_resp = {
'id': resp_id,
'age': age,
'income': income
}
valid_data.append(clean_resp)
except (ValueError, TypeError) as e:
errors.append({
'id': resp.get('id', 'unknown'),
'error': str(e),
'original_data': resp
})
return valid_data, errors
# Test
responses = [
{'id': 1, 'age': '25', 'income': '50000'},
{'id': 2, 'age': 'N/A', 'income': '75000'},
{'id': 3, 'age': '35', 'income': 'unknown'},
{'id': 4, 'age': '40', 'income': '85000'},
{'id': 5, 'age': '150', 'income': '-1000'},
]
valid_data, errors = clean_responses(responses)
print(f" Valid data: {len(valid_data)} records")
for data in valid_data:
print(f" ID{data['id']}: {data['age']} years old, ${data['income']:,.0f}")
print(f"\n Error data: {len(errors)} records")
for error in errors:
print(f" ID{error['id']}: {error['error']}")Exercise 3: API Request with Retry (Intermediate)
Difficulty: ⭐⭐⭐ Time: 25 minutes
python
"""
Task: Create an API request function with retry mechanism
Requirements:
1. Support up to N retries
2. Handle timeout errors
3. Handle HTTP errors
4. Use exponential backoff (waiting time doubles each time)
5. Log each retry
"""
import requests
import time
def fetch_with_retry(url, max_retries=3, timeout=5):
"""API request with retry"""
# Your code
passReference Answer
python
import requests
import time
import logging
logging.basicConfig(level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s')
def fetch_with_retry(url, max_retries=3, timeout=5):
"""API request with retry mechanism
Parameters:
url: API address
max_retries: Maximum number of retries
timeout: Timeout in seconds
Returns:
Response data (dictionary) or None
"""
for attempt in range(1, max_retries + 1):
try:
logging.info(f"Attempt {attempt}/{max_retries}: {url}")
# Send request
response = requests.get(url, timeout=timeout)
# Check HTTP status code
response.raise_for_status()
# Parse JSON
data = response.json()
logging.info(f" Successfully retrieved data")
return data
except requests.exceptions.Timeout:
wait_time = 2 ** (attempt - 1) # Exponential backoff: 1, 2, 4, 8...
logging.warning(f"Timeout, waiting {wait_time} seconds before retry...")
if attempt < max_retries:
time.sleep(wait_time)
else:
logging.error(f" Maximum retries reached, giving up")
return None
except requests.exceptions.HTTPError as e:
logging.error(f" HTTP error: {e}")
logging.error(f" Status code: {e.response.status_code}")
return None
except requests.exceptions.RequestException as e:
logging.error(f" Request error: {e}")
return None
except ValueError as e:
logging.error(f" JSON parse error: {e}")
return None
return None
# Test
if __name__ == "__main__":
# Test 1: Successful request (using public API)
print("Test 1: Normal request")
data = fetch_with_retry('https://api.github.com/users/github')
if data:
print(f"Username: {data.get('login')}")
print(f"Repository count: {data.get('public_repos')}")
# Test 2: Timeout (using a slow address)
print("\nTest 2: Timeout retry")
data = fetch_with_retry('https://httpbin.org/delay/10', max_retries=2, timeout=1)
# Test 3: 404 error
print("\nTest 3: HTTP error")
data = fetch_with_retry('https://api.github.com/nonexistent', max_retries=2)Exercise 4: Custom Exception Classes (Intermediate)
Difficulty: ⭐⭐⭐ Time: 30 minutes
python
"""
Task: Create a survey data validation system using custom exceptions
Requirements:
1. Define multiple custom exception classes
2. Raise appropriate exceptions in validation functions
3. Handle them uniformly in main program
"""
# Define custom exceptions
class SurveyValidationError(Exception):
"""Base exception for survey validation"""
pass
class InvalidAgeError(SurveyValidationError):
"""Age invalid exception"""
pass
# Continue defining other exceptions...Reference Answer
python
# Custom exception classes
class SurveyValidationError(Exception):
"""Base exception for survey validation"""
pass
class InvalidAgeError(SurveyValidationError):
"""Age invalid exception"""
def __init__(self, age, min_age=18, max_age=100):
self.age = age
self.min_age = min_age
self.max_age = max_age
super().__init__(f"Age {age} out of range [{min_age}, {max_age}]")
class InvalidIncomeError(SurveyValidationError):
"""Income invalid exception"""
def __init__(self, income):
self.income = income
super().__init__(f"Invalid income: {income}")
class MissingFieldError(SurveyValidationError):
"""Missing required field exception"""
def __init__(self, field_name):
self.field_name = field_name
super().__init__(f"Missing required field: {field_name}")
# Validation function
def validate_response(response, min_age=18, max_age=100):
"""Validate single response
Parameters:
response: Response dictionary
min_age: Minimum age
max_age: Maximum age
Raises:
MissingFieldError: Missing field
InvalidAgeError: Invalid age
InvalidIncomeError: Invalid income
"""
# Check required fields
required_fields = ['id', 'age', 'income']
for field in required_fields:
if field not in response:
raise MissingFieldError(field)
# Validate age
age = response['age']
if not isinstance(age, (int, float)):
raise InvalidAgeError(age, min_age, max_age)
if not (min_age <= age <= max_age):
raise InvalidAgeError(age, min_age, max_age)
# Validate income
income = response['income']
if not isinstance(income, (int, float)):
raise InvalidIncomeError(income)
if income < 0:
raise InvalidIncomeError(income)
# Batch validation
def validate_all_responses(responses, min_age=18, max_age=100):
"""Validate all responses
Returns:
(valid_responses, errors): Valid responses and error list
"""
valid_responses = []
errors = []
for i, resp in enumerate(responses):
try:
validate_response(resp, min_age, max_age)
valid_responses.append(resp)
except MissingFieldError as e:
errors.append({
'index': i,
'id': resp.get('id', 'unknown'),
'error_type': 'MissingField',
'error': str(e)
})
except InvalidAgeError as e:
errors.append({
'index': i,
'id': resp.get('id', 'unknown'),
'error_type': 'InvalidAge',
'error': str(e),
'value': e.age
})
except InvalidIncomeError as e:
errors.append({
'index': i,
'id': resp.get('id', 'unknown'),
'error_type': 'InvalidIncome',
'error': str(e),
'value': e.income
})
except Exception as e:
errors.append({
'index': i,
'id': resp.get('id', 'unknown'),
'error_type': 'Unknown',
'error': str(e)
})
return valid_responses, errors
# Test
if __name__ == "__main__":
responses = [
{'id': 1, 'age': 25, 'income': 50000}, # Valid
{'id': 2, 'age': 15, 'income': 30000}, # Age too low
{'id': 3, 'income': 75000}, # Missing age
{'id': 4, 'age': 35, 'income': -5000}, # Negative income
{'id': 5, 'age': 150, 'income': 85000}, # Age too high
{'id': 6, 'age': 30, 'income': 70000}, # Valid
]
valid, errors = validate_all_responses(responses, min_age=18, max_age=100)
print(f" Valid responses: {len(valid)} records")
for resp in valid:
print(f" ID{resp['id']}: {resp['age']} years old, ${resp['income']:,}")
print(f"\n Error responses: {len(errors)} records")
for error in errors:
print(f" ID{error['id']} ({error['error_type']}): {error['error']}")
# Error type statistics
print(f"\nError type statistics:")
error_types = {}
for error in errors:
error_type = error['error_type']
error_types[error_type] = error_types.get(error_type, 0) + 1
for error_type, count in error_types.items():
print(f" {error_type}: {count} error(s)")Exercise 5: Comprehensive Debugging Case (Advanced)
Difficulty: ⭐⭐⭐⭐ Time: 40 minutes
python
"""
Task: Debug a data analysis script with multiple errors
Script functions:
1. Read CSV file
2. Clean data
3. Calculate statistics
4. Save results
Requirements:
1. Find all errors
2. Use try-except to handle them
3. Add logging
4. Add input validation
"""
# Code with errors (find and fix)
import pandas as pd
def analyze_survey(filename):
# Read data
df = pd.read_csv(filename)
# Calculate average age
avg_age = df['age'].mean()
# Calculate income median
median_income = df['imcome'].median() # Spelling error
# Filter high earners
high_earners = df[df['income'] > median_income]
# Save results
high_earners.to_csv('high_earners.csv')
return avg_age, median_incomeReference Answer (Fixed)
python
import pandas as pd
import logging
from pathlib import Path
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler('analysis.log'),
logging.StreamHandler()
]
)
def analyze_survey(filename, output_filename='high_earners.csv'):
"""Analyze survey data (with complete error handling and logging)
Parameters:
filename: Input CSV file path
output_filename: Output file path
Returns:
Dictionary of statistics or None
"""
try:
logging.info(f"Starting analysis: {filename}")
# 1. Check if file exists
file_path = Path(filename)
if not file_path.exists():
logging.error(f"File not found: {filename}")
return None
# 2. Read data
try:
df = pd.read_csv(file_path)
logging.info(f"Successfully read: {len(df)} rows")
except pd.errors.EmptyDataError:
logging.error("File is empty")
return None
except pd.errors.ParserError as e:
logging.error(f"CSV parse error: {e}")
return None
# 3. Validate required columns
required_columns = ['age', 'income']
missing_columns = set(required_columns) - set(df.columns)
if missing_columns:
logging.error(f"Missing columns: {missing_columns}")
logging.error(f"Available columns: {list(df.columns)}")
return None
# 4. Data cleaning
original_len = len(df)
df = df.dropna(subset=['age', 'income'])
df = df[(df['age'] > 0) & (df['age'] < 120)]
df = df[df['income'] >= 0]
cleaned_len = len(df)
if cleaned_len < original_len:
logging.warning(f"Removed {original_len - cleaned_len} rows of invalid data")
if cleaned_len == 0:
logging.error("No data after cleaning")
return None
# 5. Calculate statistics
try:
avg_age = float(df['age'].mean())
median_income = float(df['income'].median())
logging.info(f"Average age: {avg_age:.1f}")
logging.info(f"Income median: ${median_income:,.0f}")
except Exception as e:
logging.error(f"Statistics calculation error: {e}")
return None
# 6. Filter high earners
try:
high_earners = df[df['income'] > median_income].copy()
logging.info(f"High earners: {len(high_earners)} people ({len(high_earners)/len(df)*100:.1f}%)")
except Exception as e:
logging.error(f"Filter error: {e}")
return None
# 7. Save results
try:
high_earners.to_csv(output_filename, index=False, encoding='utf-8-sig')
logging.info(f"Results saved: {output_filename}")
except Exception as e:
logging.error(f"Save failed: {e}")
return None
# 8. Return statistics
results = {
'total_responses': cleaned_len,
'avg_age': round(avg_age, 2),
'median_income': round(median_income, 2),
'high_earners_count': len(high_earners),
'high_earners_percentage': round(len(high_earners)/len(df)*100, 2)
}
logging.info("Analysis complete")
return results
except Exception as e:
logging.error(f"Unknown error: {e}", exc_info=True)
return None
# Test
if __name__ == "__main__":
# Create test data
test_data = pd.DataFrame({
'id': range(1, 11),
'age': [25, 30, 35, 28, 32, 40, 27, 33, 38, 29],
'income': [50000, 75000, 85000, 60000, 70000, 90000, 55000, 80000, 95000, 65000]
})
test_data.to_csv('test_survey.csv', index=False)
# Run analysis
results = analyze_survey('test_survey.csv')
if results:
print("\nAnalysis results:")
print(f" Total people: {results['total_responses']}")
print(f" Average age: {results['avg_age']}")
print(f" Income median: ${results['median_income']:,.0f}")
print(f" High earners: {results['high_earners_count']} people ({results['high_earners_percentage']}%)")Next Steps
After completing this chapter, you have mastered:
- Common error types
- try-except exception handling
- Debugging techniques (print, assert, logging, pdb)
- Error logging
- Custom exceptions
Congratulations on completing Module 8!
In Module 9, we will learn about Data Science Core Libraries (NumPy, Pandas, Matplotlib, etc.).
Further Reading
- Python Official Documentation: Errors and Exceptions
- Python logging Module
- Real Python: Python Debugging
Ready to learn data science libraries?