Module 8 Summary and Review

Exception Handling and Debugging — Making Code More Robust

Knowledge Point Summary

1. Error Types

Syntax Errors vs Runtime Exceptions:

Syntax Error: Code written incorrectly, cannot run
Exception: Errors that occur during program execution

Common Runtime Exceptions:

Exception Type	Reason	Example
`NameError`	Variable undefined	`print(undefined_var)`
`TypeError`	Type error	`"5" + 5`
`ValueError`	Value error	`int("abc")`
`IndexError`	Index out of range	`list[999]`
`KeyError`	Dictionary key doesn't exist	`dict['missing_key']`
`FileNotFoundError`	File doesn't exist	`open('missing.txt')`
`ZeroDivisionError`	Division by zero	`10 / 0`
`AttributeError`	Attribute doesn't exist	`list.push()`

2. try-except Exception Handling

Basic Syntax:

python

try:
    # Code that might raise an error
    risky_operation()
except ExceptionType:
    # Handle the error
    handle_error()

Complete Syntax:

python

try:
    # Try to execute
    result = operation()
except ValueError:
    # Handle specific exception
    print("Value error")
except FileNotFoundError:
    # Handle another exception
    print("File not found")
except Exception as e:
    # Catch all other exceptions
    print(f"Other error: {e}")
else:
    # Execute if no exception occurs
    print(f"Success: {result}")
finally:
    # Always execute (cleanup resources)
    cleanup()

Multiple Exception Catching:

python

# Method 1: Catch separately
try:
    operation()
except ValueError:
    handle_value_error()
except TypeError:
    handle_type_error()

# Method 2: Catch together
try:
    operation()
except (ValueError, TypeError) as e:
    handle_error(e)

3. Debugging Techniques

Debugging Levels:

Print Debugging: Simplest, quick to locate issues
Assertions (Assert): Verify assumptions
Logging: Persist debugging information
Debugger (pdb): Interactive debugging

Print Debugging:

python

def calculate_tax(income, rate):
    print(f"Debug: income={income}, rate={rate}")
    tax = income * rate
    print(f"Debug: tax={tax}")
    return tax

Assertions:

python

def calculate_mean(data):
    assert len(data) > 0, "Data cannot be empty"
    assert all(isinstance(x, (int, float)) for x in data), "All values must be numbers"
    return sum(data) / len(data)

Logging:

python

import logging

logging.basicConfig(level=logging.DEBUG,
                   format='%(asctime)s - %(levelname)s - %(message)s')

logging.debug("Detailed information")
logging.info("General information")
logging.warning("Warning")
logging.error("Error")
logging.critical("Critical error")

Common Mistakes

1. Catching Too Broad Exceptions

python

# Wrong: Catch all exceptions
try:
    result = operation()
except:
    print("Error occurred")

# Correct: Catch specific exceptions
try:
    result = operation()
except ValueError:
    print("Value error")
except FileNotFoundError:
    print("File not found")

2. Ignoring Exception Information

python

# Wrong: Don't save exception information
try:
    operation()
except Exception:
    print("Error occurred")

# Correct: Save exception information
try:
    operation()
except Exception as e:
    print(f"Error occurred: {e}")
    logging.error(f"Detailed error: {e}", exc_info=True)

3. Overusing try-except

python

# Wrong: Use exception instead of normal logic
try:
    value = dict['key']
except KeyError:
    value = None

# Correct: Use conditional logic
value = dict.get('key', None)

Best Practices

1. Catch Specific Exceptions

python

# Good habit
try:
    df = pd.read_csv('data.csv')
except FileNotFoundError:
    print("File not found")
except pd.errors.EmptyDataError:
    print("File is empty")
except Exception as e:
    print(f"Other error: {e}")

2. Use else Clause

python

try:
    df = pd.read_csv('data.csv')
except FileNotFoundError:
    print("File not found")
else:
    # Only execute on success
    print(f"Successfully loaded {len(df)} rows")

3. finally for Resource Cleanup

python

# Method 1: Using finally
file = None
try:
    file = open('data.txt', 'r')
    content = file.read()
except FileNotFoundError:
    print("File not found")
finally:
    if file:
        file.close()

# Method 2: Using with (better)
with open('data.txt', 'r') as file:
    content = file.read()

4. Log Complete Error Information

python

import logging
import traceback

try:
    operation()
except Exception as e:
    logging.error(f"Error: {e}")
    logging.error(traceback.format_exc())

Programming Exercises

Exercise 1: Safe Data Reading Function (Basic)

Difficulty: ⭐⭐ Time: 15 minutes

python

"""
Task: Create a safe CSV reading function

Requirements:
1. Check if file exists
2. Handle empty file case
3. Verify required columns exist
4. Return DataFrame or None
"""

import pandas as pd
from pathlib import Path

def safe_read_csv(filename, required_columns=None):
    """Safely read a CSV file"""
    # Your code
    pass

# Test
df = safe_read_csv('survey.csv', required_columns=['id', 'age', 'income'])
if df is not None:
    print(f"Success: {len(df)} rows")

Reference Answer

python

import pandas as pd
from pathlib import Path

def safe_read_csv(filename, required_columns=None):
    """Safely read a CSV file

    Parameters:
        filename: Path to CSV file
        required_columns: List of required column names

    Returns:
        DataFrame or None
    """
    try:
        # Check if file exists
        file_path = Path(filename)
        if not file_path.exists():
            print(f" File not found: {filename}")
            return None

        # Read CSV
        df = pd.read_csv(file_path)

        # Check if empty
        if len(df) == 0:
            print(f"File is empty: {filename}")
            return None

        # Validate required columns
        if required_columns:
            missing_columns = set(required_columns) - set(df.columns)
            if missing_columns:
                print(f" Missing columns: {missing_columns}")
                print(f"   Available columns: {list(df.columns)}")
                return None

        print(f" Successfully loaded: {len(df)} rows × {len(df.columns)} columns")
        return df

    except pd.errors.EmptyDataError:
        print(f" File corrupted or empty: {filename}")
        return None

    except pd.errors.ParserError as e:
        print(f" Parse error: {e}")
        return None

    except Exception as e:
        print(f" Unknown error: {e}")
        return None


# Test
if __name__ == "__main__":
    # Create test file
    test_data = pd.DataFrame({
        'id': [1, 2, 3],
        'age': [25, 30, 35],
        'income': [50000, 75000, 85000]
    })
    test_data.to_csv('test_survey.csv', index=False)

    # Test normal reading
    print("Test 1: Normal reading")
    df = safe_read_csv('test_survey.csv', required_columns=['id', 'age', 'income'])

    # Test file not found
    print("\nTest 2: File not found")
    df = safe_read_csv('missing.csv')

    # Test missing columns
    print("\nTest 3: Missing required columns")
    df = safe_read_csv('test_survey.csv', required_columns=['id', 'age', 'education'])

Exercise 2: Batch Data Cleaning (Basic)

Difficulty: ⭐⭐ Time: 20 minutes

python

"""
Task: Clean survey data and handle various anomalies

Data issues:
- age may be string or invalid value
- income may be string or negative
- Fields may be missing
"""

responses = [
    {'id': 1, 'age': '25', 'income': '50000'},
    {'id': 2, 'age': 'N/A', 'income': '75000'},
    {'id': 3, 'age': '35', 'income': 'unknown'},
    {'id': 4, 'age': '40', 'income': '85000'},
    {'id': 5, 'age': '150', 'income': '-1000'},
]

def clean_responses(responses):
    """Clean response data"""
    # Your code
    pass

Reference Answer

python

def clean_responses(responses):
    """Clean response data

    Parameters:
        responses: List of response dictionaries

    Returns:
        (valid_data, errors): List of valid data and error list
    """
    valid_data = []
    errors = []

    for resp in responses:
        try:
            resp_id = resp.get('id', 'unknown')

            # Validate and convert age
            age_str = resp.get('age')
            if age_str is None:
                raise ValueError("Missing age field")

            age = int(age_str)
            if not (0 < age < 120):
                raise ValueError(f"Age out of range: {age}")

            # Validate and convert income
            income_str = resp.get('income')
            if income_str is None:
                raise ValueError("Missing income field")

            income = float(income_str)
            if income < 0:
                raise ValueError(f"Income is negative: {income}")

            # Create cleaned data
            clean_resp = {
                'id': resp_id,
                'age': age,
                'income': income
            }
            valid_data.append(clean_resp)

        except (ValueError, TypeError) as e:
            errors.append({
                'id': resp.get('id', 'unknown'),
                'error': str(e),
                'original_data': resp
            })

    return valid_data, errors


# Test
responses = [
    {'id': 1, 'age': '25', 'income': '50000'},
    {'id': 2, 'age': 'N/A', 'income': '75000'},
    {'id': 3, 'age': '35', 'income': 'unknown'},
    {'id': 4, 'age': '40', 'income': '85000'},
    {'id': 5, 'age': '150', 'income': '-1000'},
]

valid_data, errors = clean_responses(responses)

print(f" Valid data: {len(valid_data)} records")
for data in valid_data:
    print(f"   ID{data['id']}: {data['age']} years old, ${data['income']:,.0f}")

print(f"\n Error data: {len(errors)} records")
for error in errors:
    print(f"   ID{error['id']}: {error['error']}")

Exercise 3: API Request with Retry (Intermediate)

Difficulty: ⭐⭐⭐ Time: 25 minutes

python

"""
Task: Create an API request function with retry mechanism

Requirements:
1. Support up to N retries
2. Handle timeout errors
3. Handle HTTP errors
4. Use exponential backoff (waiting time doubles each time)
5. Log each retry
"""

import requests
import time

def fetch_with_retry(url, max_retries=3, timeout=5):
    """API request with retry"""
    # Your code
    pass

Reference Answer

python

import requests
import time
import logging

logging.basicConfig(level=logging.INFO,
                   format='%(asctime)s - %(levelname)s - %(message)s')

def fetch_with_retry(url, max_retries=3, timeout=5):
    """API request with retry mechanism

    Parameters:
        url: API address
        max_retries: Maximum number of retries
        timeout: Timeout in seconds

    Returns:
        Response data (dictionary) or None
    """
    for attempt in range(1, max_retries + 1):
        try:
            logging.info(f"Attempt {attempt}/{max_retries}: {url}")

            # Send request
            response = requests.get(url, timeout=timeout)

            # Check HTTP status code
            response.raise_for_status()

            # Parse JSON
            data = response.json()

            logging.info(f" Successfully retrieved data")
            return data

        except requests.exceptions.Timeout:
            wait_time = 2 ** (attempt - 1)  # Exponential backoff: 1, 2, 4, 8...
            logging.warning(f"Timeout, waiting {wait_time} seconds before retry...")

            if attempt < max_retries:
                time.sleep(wait_time)
            else:
                logging.error(f" Maximum retries reached, giving up")
                return None

        except requests.exceptions.HTTPError as e:
            logging.error(f" HTTP error: {e}")
            logging.error(f"   Status code: {e.response.status_code}")
            return None

        except requests.exceptions.RequestException as e:
            logging.error(f" Request error: {e}")
            return None

        except ValueError as e:
            logging.error(f" JSON parse error: {e}")
            return None

    return None


# Test
if __name__ == "__main__":
    # Test 1: Successful request (using public API)
    print("Test 1: Normal request")
    data = fetch_with_retry('https://api.github.com/users/github')
    if data:
        print(f"Username: {data.get('login')}")
        print(f"Repository count: {data.get('public_repos')}")

    # Test 2: Timeout (using a slow address)
    print("\nTest 2: Timeout retry")
    data = fetch_with_retry('https://httpbin.org/delay/10', max_retries=2, timeout=1)

    # Test 3: 404 error
    print("\nTest 3: HTTP error")
    data = fetch_with_retry('https://api.github.com/nonexistent', max_retries=2)

Exercise 4: Custom Exception Classes (Intermediate)

Difficulty: ⭐⭐⭐ Time: 30 minutes

python

"""
Task: Create a survey data validation system using custom exceptions

Requirements:
1. Define multiple custom exception classes
2. Raise appropriate exceptions in validation functions
3. Handle them uniformly in main program
"""

# Define custom exceptions
class SurveyValidationError(Exception):
    """Base exception for survey validation"""
    pass

class InvalidAgeError(SurveyValidationError):
    """Age invalid exception"""
    pass

# Continue defining other exceptions...

Reference Answer

python

# Custom exception classes
class SurveyValidationError(Exception):
    """Base exception for survey validation"""
    pass

class InvalidAgeError(SurveyValidationError):
    """Age invalid exception"""
    def __init__(self, age, min_age=18, max_age=100):
        self.age = age
        self.min_age = min_age
        self.max_age = max_age
        super().__init__(f"Age {age} out of range [{min_age}, {max_age}]")

class InvalidIncomeError(SurveyValidationError):
    """Income invalid exception"""
    def __init__(self, income):
        self.income = income
        super().__init__(f"Invalid income: {income}")

class MissingFieldError(SurveyValidationError):
    """Missing required field exception"""
    def __init__(self, field_name):
        self.field_name = field_name
        super().__init__(f"Missing required field: {field_name}")


# Validation function
def validate_response(response, min_age=18, max_age=100):
    """Validate single response

    Parameters:
        response: Response dictionary
        min_age: Minimum age
        max_age: Maximum age

    Raises:
        MissingFieldError: Missing field
        InvalidAgeError: Invalid age
        InvalidIncomeError: Invalid income
    """
    # Check required fields
    required_fields = ['id', 'age', 'income']
    for field in required_fields:
        if field not in response:
            raise MissingFieldError(field)

    # Validate age
    age = response['age']
    if not isinstance(age, (int, float)):
        raise InvalidAgeError(age, min_age, max_age)
    if not (min_age <= age <= max_age):
        raise InvalidAgeError(age, min_age, max_age)

    # Validate income
    income = response['income']
    if not isinstance(income, (int, float)):
        raise InvalidIncomeError(income)
    if income < 0:
        raise InvalidIncomeError(income)


# Batch validation
def validate_all_responses(responses, min_age=18, max_age=100):
    """Validate all responses

    Returns:
        (valid_responses, errors): Valid responses and error list
    """
    valid_responses = []
    errors = []

    for i, resp in enumerate(responses):
        try:
            validate_response(resp, min_age, max_age)
            valid_responses.append(resp)

        except MissingFieldError as e:
            errors.append({
                'index': i,
                'id': resp.get('id', 'unknown'),
                'error_type': 'MissingField',
                'error': str(e)
            })

        except InvalidAgeError as e:
            errors.append({
                'index': i,
                'id': resp.get('id', 'unknown'),
                'error_type': 'InvalidAge',
                'error': str(e),
                'value': e.age
            })

        except InvalidIncomeError as e:
            errors.append({
                'index': i,
                'id': resp.get('id', 'unknown'),
                'error_type': 'InvalidIncome',
                'error': str(e),
                'value': e.income
            })

        except Exception as e:
            errors.append({
                'index': i,
                'id': resp.get('id', 'unknown'),
                'error_type': 'Unknown',
                'error': str(e)
            })

    return valid_responses, errors


# Test
if __name__ == "__main__":
    responses = [
        {'id': 1, 'age': 25, 'income': 50000},      # Valid
        {'id': 2, 'age': 15, 'income': 30000},      # Age too low
        {'id': 3, 'income': 75000},                 # Missing age
        {'id': 4, 'age': 35, 'income': -5000},      # Negative income
        {'id': 5, 'age': 150, 'income': 85000},     # Age too high
        {'id': 6, 'age': 30, 'income': 70000},      # Valid
    ]

    valid, errors = validate_all_responses(responses, min_age=18, max_age=100)

    print(f" Valid responses: {len(valid)} records")
    for resp in valid:
        print(f"   ID{resp['id']}: {resp['age']} years old, ${resp['income']:,}")

    print(f"\n Error responses: {len(errors)} records")
    for error in errors:
        print(f"   ID{error['id']} ({error['error_type']}): {error['error']}")

    # Error type statistics
    print(f"\nError type statistics:")
    error_types = {}
    for error in errors:
        error_type = error['error_type']
        error_types[error_type] = error_types.get(error_type, 0) + 1

    for error_type, count in error_types.items():
        print(f"   {error_type}: {count} error(s)")

Exercise 5: Comprehensive Debugging Case (Advanced)

Difficulty: ⭐⭐⭐⭐ Time: 40 minutes

python

"""
Task: Debug a data analysis script with multiple errors

Script functions:
1. Read CSV file
2. Clean data
3. Calculate statistics
4. Save results

Requirements:
1. Find all errors
2. Use try-except to handle them
3. Add logging
4. Add input validation
"""

# Code with errors (find and fix)
import pandas as pd

def analyze_survey(filename):
    # Read data
    df = pd.read_csv(filename)

    # Calculate average age
    avg_age = df['age'].mean()

    # Calculate income median
    median_income = df['imcome'].median()  # Spelling error

    # Filter high earners
    high_earners = df[df['income'] > median_income]

    # Save results
    high_earners.to_csv('high_earners.csv')

    return avg_age, median_income

Reference Answer (Fixed)

python

import pandas as pd
import logging
from pathlib import Path

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('analysis.log'),
        logging.StreamHandler()
    ]
)

def analyze_survey(filename, output_filename='high_earners.csv'):
    """Analyze survey data (with complete error handling and logging)

    Parameters:
        filename: Input CSV file path
        output_filename: Output file path

    Returns:
        Dictionary of statistics or None
    """
    try:
        logging.info(f"Starting analysis: {filename}")

        # 1. Check if file exists
        file_path = Path(filename)
        if not file_path.exists():
            logging.error(f"File not found: {filename}")
            return None

        # 2. Read data
        try:
            df = pd.read_csv(file_path)
            logging.info(f"Successfully read: {len(df)} rows")
        except pd.errors.EmptyDataError:
            logging.error("File is empty")
            return None
        except pd.errors.ParserError as e:
            logging.error(f"CSV parse error: {e}")
            return None

        # 3. Validate required columns
        required_columns = ['age', 'income']
        missing_columns = set(required_columns) - set(df.columns)
        if missing_columns:
            logging.error(f"Missing columns: {missing_columns}")
            logging.error(f"Available columns: {list(df.columns)}")
            return None

        # 4. Data cleaning
        original_len = len(df)
        df = df.dropna(subset=['age', 'income'])
        df = df[(df['age'] > 0) & (df['age'] < 120)]
        df = df[df['income'] >= 0]

        cleaned_len = len(df)
        if cleaned_len < original_len:
            logging.warning(f"Removed {original_len - cleaned_len} rows of invalid data")

        if cleaned_len == 0:
            logging.error("No data after cleaning")
            return None

        # 5. Calculate statistics
        try:
            avg_age = float(df['age'].mean())
            median_income = float(df['income'].median())

            logging.info(f"Average age: {avg_age:.1f}")
            logging.info(f"Income median: ${median_income:,.0f}")
        except Exception as e:
            logging.error(f"Statistics calculation error: {e}")
            return None

        # 6. Filter high earners
        try:
            high_earners = df[df['income'] > median_income].copy()
            logging.info(f"High earners: {len(high_earners)} people ({len(high_earners)/len(df)*100:.1f}%)")
        except Exception as e:
            logging.error(f"Filter error: {e}")
            return None

        # 7. Save results
        try:
            high_earners.to_csv(output_filename, index=False, encoding='utf-8-sig')
            logging.info(f"Results saved: {output_filename}")
        except Exception as e:
            logging.error(f"Save failed: {e}")
            return None

        # 8. Return statistics
        results = {
            'total_responses': cleaned_len,
            'avg_age': round(avg_age, 2),
            'median_income': round(median_income, 2),
            'high_earners_count': len(high_earners),
            'high_earners_percentage': round(len(high_earners)/len(df)*100, 2)
        }

        logging.info("Analysis complete")
        return results

    except Exception as e:
        logging.error(f"Unknown error: {e}", exc_info=True)
        return None


# Test
if __name__ == "__main__":
    # Create test data
    test_data = pd.DataFrame({
        'id': range(1, 11),
        'age': [25, 30, 35, 28, 32, 40, 27, 33, 38, 29],
        'income': [50000, 75000, 85000, 60000, 70000, 90000, 55000, 80000, 95000, 65000]
    })
    test_data.to_csv('test_survey.csv', index=False)

    # Run analysis
    results = analyze_survey('test_survey.csv')

    if results:
        print("\nAnalysis results:")
        print(f"  Total people: {results['total_responses']}")
        print(f"  Average age: {results['avg_age']}")
        print(f"  Income median: ${results['median_income']:,.0f}")
        print(f"  High earners: {results['high_earners_count']} people ({results['high_earners_percentage']}%)")

Next Steps

After completing this chapter, you have mastered:

Common error types
try-except exception handling
Debugging techniques (print, assert, logging, pdb)
Error logging
Custom exceptions

Congratulations on completing Module 8!

In Module 9, we will learn about Data Science Core Libraries (NumPy, Pandas, Matplotlib, etc.).

Module 8 Summary and Review ​

Knowledge Point Summary ​

1. Error Types ​

2. try-except Exception Handling ​

3. Debugging Techniques ​

Common Mistakes ​

1. Catching Too Broad Exceptions ​

2. Ignoring Exception Information ​

3. Overusing try-except ​

Best Practices ​

1. Catch Specific Exceptions ​

2. Use else Clause ​

3. finally for Resource Cleanup ​

4. Log Complete Error Information ​

Programming Exercises ​

Exercise 1: Safe Data Reading Function (Basic) ​

Exercise 2: Batch Data Cleaning (Basic) ​

Exercise 3: API Request with Retry (Intermediate) ​

Exercise 4: Custom Exception Classes (Intermediate) ​

Exercise 5: Comprehensive Debugging Case (Advanced) ​

Next Steps ​

Further Reading ​

Module 8 Summary and Review

Knowledge Point Summary

1. Error Types

2. try-except Exception Handling

3. Debugging Techniques

Common Mistakes

1. Catching Too Broad Exceptions

2. Ignoring Exception Information

3. Overusing try-except

Best Practices

1. Catch Specific Exceptions

2. Use else Clause

3. finally for Resource Cleanup

4. Log Complete Error Information

Programming Exercises

Exercise 1: Safe Data Reading Function (Basic)

Exercise 2: Batch Data Cleaning (Basic)

Exercise 3: API Request with Retry (Intermediate)

Exercise 4: Custom Exception Classes (Intermediate)

Exercise 5: Comprehensive Debugging Case (Advanced)

Next Steps

Further Reading