Module 6 Summary and Review

Object-Oriented Programming Basics — Understanding Classes and Objects

Knowledge Summary

1. OOP Core Concepts

What is OOP?

Object-oriented programming is a paradigm that organizes data and the methods that operate on that data together
Object: A collection of data + methods
Class: A template/blueprint for objects
Method: A function belonging to an object

Why do we need OOP?

Data and methods are naturally bound together
Code is more organized
Easier to reuse and maintain
Aligns with real-world modeling

Core Terminology:

Term	Definition	Example
Class	Object template	`class Student:`
Object/Instance	Concrete instance of a class	`alice = Student()`
Attribute	Object's data	`alice.name = "Alice"`
Method	Object's function	`alice.calculate_gpa()`
self	Refers to current object	`self.name`
Constructor	Initialize object	`__init__()`

2. Basic Class Structure

python

class ClassName:
    """Class docstring"""

    # Class attribute (shared by all objects)
    class_variable = "shared"

    def __init__(self, param1, param2):
        """Constructor"""
        self.param1 = param1  # Instance attribute
        self.param2 = param2

    def instance_method(self):
        """Instance method"""
        return self.param1

    @classmethod
    def class_method(cls):
        """Class method"""
        return cls.class_variable

    @staticmethod
    def static_method():
        """Static method"""
        return "Does not depend on class or instance"

Three Method Types:

Method Type	First Parameter	Access Instance Attributes	Access Class Attributes	Use Case
Instance method	`self`	✓	✓	Most common, operate on object data
Class method	`cls`	✗	✓	Factory methods, alternative constructors
Static method	None	✗	✗	Utility functions

3. Instance Attributes vs Class Attributes

python

class Survey:
    # Class attribute (shared by all objects)
    total_surveys = 0

    def __init__(self, name, year):
        # Instance attributes (unique to each object)
        self.name = name
        self.year = year
        Survey.total_surveys += 1  # Modify class attribute

# Usage
survey1 = Survey("Income Survey", 2024)
survey2 = Survey("Health Survey", 2024)

print(survey1.name)           # Income Survey (instance attribute)
print(Survey.total_surveys)   # 2 (class attribute)

Differences:

Instance attributes: Unique to each object, accessed via self.attr
Class attributes: Shared by all objects, accessed via ClassName.attr

4. Special Methods (Magic Methods)

Method	Purpose	Triggered By
`__init__()`	Constructor	`obj = Class()`
`__str__()`	String representation (user-friendly)	`print(obj)`
`__repr__()`	Developer representation	`repr(obj)`
`__len__()`	Length	`len(obj)`
`__getitem__()`	Index access	`obj[key]`
`__eq__()`	Equality comparison	`obj1 == obj2`

Example:

python

class Survey:
    def __init__(self, name):
        self.name = name
        self.responses = []

    def __str__(self):
        return f"Survey: {self.name} ({len(self.responses)} responses)"

    def __len__(self):
        return len(self.responses)

    def __getitem__(self, index):
        return self.responses[index]

# Usage
survey = Survey("Test")
survey.responses = [1, 2, 3]

print(survey)         # Survey: Test (3 responses)
print(len(survey))    # 3
print(survey[0])      # 1

5. Encapsulation: Public vs Private

python

class BankAccount:
    def __init__(self, balance):
        self.balance = balance       # Public attribute
        self._transactions = []      # Convention private (single underscore)
        self.__pin = "1234"          # True private (double underscore)

    def deposit(self, amount):
        """Public method"""
        self.balance += amount
        self._log_transaction("deposit", amount)

    def _log_transaction(self, type, amount):
        """Private method (convention)"""
        self._transactions.append({'type': type, 'amount': amount})

Naming Conventions:

name: Public (directly accessible)
_name: Convention private (discouraged external access, but possible)
__name: True private (Python name-mangles, difficult to access externally)

6. OOP in Data Science Applications

Pandas DataFrame:

python

import pandas as pd

df = pd.DataFrame({'age': [25, 30, 35]})

# Attributes
df.shape      # (3, 1)
df.columns    # Index(['age'])

# Methods
df.head()
df.mean()
df.to_csv('output.csv')

# Method chaining
result = (df
    .query('age > 25')
    .assign(age_squared=lambda x: x['age']**2)
    .sort_values('age')
)

Scikit-learn Models:

python

from sklearn.linear_model import LinearRegression

model = LinearRegression()  # Create object
model.fit(X, y)             # Train (method)
predictions = model.predict(X_new)  # Predict (method)

# Access attributes
print(model.coef_)       # Coefficients
print(model.intercept_)  # Intercept

Python vs Stata vs R

Object-Oriented Comparison

Python (fully object-oriented):

python

df = pd.DataFrame({'x': [1, 2, 3]})
df.mean()           # Method call
df.shape            # Attribute access

R (partially object-oriented):

df <- data.frame(x = c(1, 2, 3))
mean(df$x)          # Function call
dim(df)             # Function call

Stata (procedural):

stata

* Stata is mainly command-based
summarize income
generate log_income = log(income)
regress y x1 x2

Common Errors

1. Forgetting the self Parameter

python

# Wrong
class Student:
    def __init__(name, age):  # Forgot self
        name = name  # Won't save to object

# Correct
class Student:
    def __init__(self, name, age):
        self.name = name
        self.age = age

2. Confusing Instance and Class Attributes

python

# Wrong
class Counter:
    count = 0  # Class attribute

    def increment(self):
        count += 1  # NameError: doesn't specify self.count or Counter.count

# Correct
class Counter:
    count = 0

    def increment(self):
        Counter.count += 1  # Or self.__class__.count += 1

python

# Wrong
class Survey:
    responses = []  # Class attribute!

    def add_response(self, resp):
        self.responses.append(resp)  # All objects share the same list

# Correct
class Survey:
    def __init__(self):
        self.responses = []  # Instance attribute

4. Forgetting to Implement `str` Leads to Unfriendly Output

python

# Bad
class Student:
    def __init__(self, name):
        self.name = name

s = Student("Alice")
print(s)  # <__main__.Student object at 0x...>

# Good
class Student:
    def __init__(self, name):
        self.name = name

    def __str__(self):
        return f"Student(name='{self.name}')"

s = Student("Alice")
print(s)  # Student(name='Alice')

Best Practices

1. Use CapWords Naming for Classes

python

# Good
class StudentRecord:
    pass

class SurveyData:
    pass

# Bad
class student_record:
    pass

class surveydata:
    pass

2. Use snake_case Naming for Methods

python

class DataAnalyzer:
    def calculate_mean(self):  # ✓
        pass

    def CalculateMean(self):   # ✗
        pass

3. Use Docstrings

python

class Survey:
    """Survey class

    Manages survey data including adding responses, statistical analysis, etc.

    Attributes:
        name (str): Survey name
        year (int): Survey year
        responses (list): Response list
    """

    def __init__(self, name, year):
        self.name = name
        self.year = year
        self.responses = []

4. Support Method Chaining

python

class DataPipeline:
    def remove_outliers(self):
        # Processing logic...
        return self  # Return self

    def standardize(self):
        # Processing logic...
        return self

    def filter_missing(self):
        # Processing logic...
        return self

# Method chaining
pipeline = (DataPipeline(data)
    .remove_outliers()
    .standardize()
    .filter_missing()
)

Programming Exercises

Exercise 1: Student Grade Management System (Basic)

Difficulty: ⭐⭐ Time: 20 minutes

Create a Student class.

Requirements:

python

class Student:
    """Student class"""

    def __init__(self, student_id, name, major):
        pass

    def add_grade(self, course, grade):
        """Add a grade"""
        pass

    def get_gpa(self):
        """Calculate GPA (assuming 100-point scale, convert to 4.0 scale)"""
        pass

    def __str__(self):
        return f"Student: {self.name} ({self.major}), GPA: {self.get_gpa():.2f}"

# Test
alice = Student(2024001, "Alice Wang", "Economics")
alice.add_grade("Microeconomics", 85)
alice.add_grade("Econometrics", 90)
alice.add_grade("Statistics", 78)

print(alice)
print(f"GPA: {alice.get_gpa():.2f}")

✅ Reference Solution

python

class Student:
    """Student class"""

    def __init__(self, student_id, name, major):
        self.student_id = student_id
        self.name = name
        self.major = major
        self.grades = {}  # {course: grade}

    def add_grade(self, course, grade):
        """Add a grade"""
        if not (0 <= grade <= 100):
            raise ValueError("Grade must be between 0-100")
        self.grades[course] = grade

    def get_gpa(self):
        """Calculate GPA (100-point to 4.0 scale conversion)"""
        if not self.grades:
            return 0.0

        # Conversion rules: 90-100=4.0, 80-89=3.0, 70-79=2.0, 60-69=1.0, <60=0.0
        total_points = 0
        for grade in self.grades.values():
            if grade >= 90:
                total_points += 4.0
            elif grade >= 80:
                total_points += 3.0
            elif grade >= 70:
                total_points += 2.0
            elif grade >= 60:
                total_points += 1.0
            else:
                total_points += 0.0

        return total_points / len(self.grades)

    def get_average_score(self):
        """Calculate average score"""
        if not self.grades:
            return 0.0
        return sum(self.grades.values()) / len(self.grades)

    def __str__(self):
        return f"Student: {self.name} ({self.major}), GPA: {self.get_gpa():.2f}"

    def __repr__(self):
        return f"Student(id={self.student_id}, name='{self.name}', courses={len(self.grades)})"


# Test
alice = Student(2024001, "Alice Wang", "Economics")
alice.add_grade("Microeconomics", 85)
alice.add_grade("Econometrics", 90)
alice.add_grade("Statistics", 78)

print(alice)                           # Student: Alice Wang (Economics), GPA: 3.00
print(f"Average score: {alice.get_average_score():.1f}")  # 84.3
print(repr(alice))                      # Student(id=2024001, name='Alice Wang', courses=3)

Exercise 2: Survey Data Container (Basic)

Difficulty: ⭐⭐ Time: 25 minutes

python

class SurveyData:
    """Survey data management class"""

    def __init__(self, survey_name):
        pass

    def add_response(self, response):
        """Add a response"""
        pass

    def _validate(self, response):
        """Private method: validate data"""
        pass

    def get_average_income(self):
        """Calculate average income"""
        pass

    def filter_by_age(self, min_age, max_age):
        """Filter by age"""
        pass

    def __len__(self):
        return len(self.responses)

    def __str__(self):
        return f"{self.survey_name}: {len(self)} responses"

# Test
survey = SurveyData("2024 Income Survey")
survey.add_response({'id': 1, 'age': 30, 'income': 75000})
survey.add_response({'id': 2, 'age': 35, 'income': 85000})

print(survey)
print(f"Average income: ${survey.get_average_income():,.0f}")

✅ Reference Solution

python

class SurveyData:
    """Survey data management class"""

    def __init__(self, survey_name):
        self.survey_name = survey_name
        self.responses = []

    def add_response(self, response):
        """Add a response"""
        if self._validate(response):
            self.responses.append(response)
            return True
        else:
            print(f"⚠️ Invalid data: {response}")
            return False

    def _validate(self, response):
        """Private method: validate data"""
        required_fields = ['id', 'age', 'income']

        # Check required fields
        if not all(field in response for field in required_fields):
            return False

        # Validate age
        if not (0 < response['age'] < 120):
            return False

        # Validate income
        if response['income'] < 0:
            return False

        return True

    def get_average_income(self):
        """Calculate average income"""
        if not self.responses:
            return 0
        incomes = [r['income'] for r in self.responses]
        return sum(incomes) / len(incomes)

    def filter_by_age(self, min_age, max_age):
        """Filter by age"""
        return [r for r in self.responses
                if min_age <= r['age'] <= max_age]

    def get_income_stats(self):
        """Income statistics"""
        if not self.responses:
            return {}

        incomes = [r['income'] for r in self.responses]
        return {
            'mean': sum(incomes) / len(incomes),
            'min': min(incomes),
            'max': max(incomes),
            'count': len(incomes)
        }

    def __len__(self):
        return len(self.responses)

    def __str__(self):
        return f"{self.survey_name}: {len(self)} responses"

    def __getitem__(self, index):
        """Support index access"""
        return self.responses[index]


# Test
survey = SurveyData("2024 Income Survey")

# Add valid data
survey.add_response({'id': 1, 'age': 30, 'income': 75000})
survey.add_response({'id': 2, 'age': 35, 'income': 85000})
survey.add_response({'id': 3, 'age': 45, 'income': 95000})

# Add invalid data (will be rejected)
survey.add_response({'id': 4, 'age': -5, 'income': 50000})  # Invalid age
survey.add_response({'id': 5, 'age': 28})  # Missing income field

print(survey)  # 2024 Income Survey: 3 responses
print(f"Average income: ${survey.get_average_income():,.0f}")
print(f"Ages 30-40: {len(survey.filter_by_age(30, 40))} people")
print(f"First record: {survey[0]}")

stats = survey.get_income_stats()
print(f"\nIncome statistics:")
print(f"  Sample size: {stats['count']}")
print(f"  Average: ${stats['mean']:,.0f}")
print(f"  Range: ${stats['min']:,} - ${stats['max']:,}")

Exercise 3: Data Analysis Pipeline (Intermediate)

Difficulty: ⭐⭐⭐ Time: 35 minutes

Create a data processing pipeline that supports method chaining.

python

class DataPipeline:
    """Data processing pipeline"""

    def __init__(self, data):
        pass

    def filter_by(self, condition):
        """Filter by condition, supports Lambda"""
        pass

    def transform(self, func):
        """Transform data"""
        pass

    def group_by(self, key):
        """Group by"""
        pass

    def get_result(self):
        """Get result"""
        pass

    def summary(self):
        """Processing summary"""
        pass

# Test
data = [
    {'id': 1, 'age': 25, 'income': 50000, 'city': 'Beijing'},
    {'id': 2, 'age': 35, 'income': 80000, 'city': 'Shanghai'},
    # ...
]

result = (DataPipeline(data)
    .filter_by(lambda x: x['age'] >= 30)
    .transform(lambda x: {**x, 'income_万元': x['income'] / 10000})
    .get_result()
)

✅ Reference Solution

python

class DataPipeline:
    """Data processing pipeline"""

    def __init__(self, data):
        self.original_data = data.copy()
        self.data = data.copy()
        self.steps = []

    def filter_by(self, condition):
        """Filter by condition"""
        self.data = [item for item in self.data if condition(item)]
        self.steps.append(f"filter_by (kept {len(self.data)} records)")
        return self

    def transform(self, func):
        """Transform data"""
        self.data = [func(item) for item in self.data]
        self.steps.append("transform")
        return self

    def remove_field(self, *fields):
        """Remove fields"""
        self.data = [{k: v for k, v in item.items() if k not in fields}
                     for item in self.data]
        self.steps.append(f"remove_field({', '.join(fields)})")
        return self

    def add_field(self, field_name, func):
        """Add new field"""
        for item in self.data:
            item[field_name] = func(item)
        self.steps.append(f"add_field('{field_name}')")
        return self

    def sort_by(self, key, reverse=False):
        """Sort"""
        self.data = sorted(self.data, key=key, reverse=reverse)
        self.steps.append(f"sort_by (reverse={reverse})")
        return self

    def limit(self, n):
        """Limit number"""
        self.data = self.data[:n]
        self.steps.append(f"limit({n})")
        return self

    def group_by(self, key_func):
        """Group by"""
        groups = {}
        for item in self.data:
            group_key = key_func(item)
            if group_key not in groups:
                groups[group_key] = []
            groups[group_key].append(item)

        # Convert to grouped result format
        self.data = [
            {'group': k, 'items': v, 'count': len(v)}
            for k, v in groups.items()
        ]
        self.steps.append(f"group_by ({len(self.data)} groups)")
        return self

    def get_result(self):
        """Get result"""
        return self.data

    def summary(self):
        """Processing summary"""
        print("=" * 50)
        print(f"Data Processing Pipeline Summary")
        print("=" * 50)
        print(f"Original data: {len(self.original_data)} records")
        print(f"After processing: {len(self.data)} records")
        print(f"\nProcessing steps:")
        for i, step in enumerate(self.steps, 1):
            print(f"  {i}. {step}")
        print("=" * 50)

    def __len__(self):
        return len(self.data)

    def __repr__(self):
        return f"DataPipeline(records={len(self.data)}, steps={len(self.steps)})"


# Test
data = [
    {'id': 1, 'age': 25, 'income': 50000, 'city': 'Beijing', 'gender': 'F'},
    {'id': 2, 'age': 35, 'income': 80000, 'city': 'Shanghai', 'gender': 'M'},
    {'id': 3, 'age': 45, 'income': 120000, 'city': 'Beijing', 'gender': 'F'},
    {'id': 4, 'age': 28, 'income': 65000, 'city': 'Guangzhou', 'gender': 'M'},
    {'id': 5, 'age': 32, 'income': 95000, 'city': 'Shanghai', 'gender': 'F'},
    {'id': 6, 'age': 40, 'income': 110000, 'city': 'Beijing', 'gender': 'M'},
]

# Example 1: Basic pipeline
print("Example 1: Filter age >= 30, convert income to 10k units")
result1 = (DataPipeline(data)
    .filter_by(lambda x: x['age'] >= 30)
    .add_field('income_万元', lambda x: round(x['income'] / 10000, 2))
    .remove_field('gender')
    .sort_by(lambda x: x['income'], reverse=True)
    .get_result()
)

for r in result1:
    print(f"  ID{r['id']}: {r['age']} years old, {r['city']}, {r['income_万元']} 万元")

# Example 2: Group statistics
print("\nExample 2: Group by city")
pipeline2 = DataPipeline(data)
result2 = (pipeline2
    .filter_by(lambda x: x['age'] >= 25)
    .group_by(lambda x: x['city'])
    .get_result()
)

for group in result2:
    avg_income = sum(item['income'] for item in group['items']) / len(group['items'])
    print(f"  {group['group']:12s}: {group['count']} people, average income ${avg_income:,.0f}")

pipeline2.summary()

# Example 3: Top N
print("\nExample 3: Top 3 highest incomes")
result3 = (DataPipeline(data)
    .sort_by(lambda x: x['income'], reverse=True)
    .limit(3)
    .get_result()
)

for i, r in enumerate(result3, 1):
    print(f"  {i}. ID{r['id']}: {r['age']} years old, ${r['income']:,}")

Exercise 4: Simple Linear Regression Class (Advanced)

Difficulty: ⭐⭐⭐⭐ Time: 40 minutes

Implement a simple linear regression class, mimicking Scikit-learn's API design.

python

class SimpleLinearRegression:
    """Simple linear regression"""

    def __init__(self):
        pass

    def fit(self, X, y):
        """Fit model"""
        pass

    def predict(self, X):
        """Predict"""
        pass

    def score(self, X, y):
        """Calculate R²"""
        pass

    def __repr__(self):
        pass

# Test
X = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 5]

model = SimpleLinearRegression()
model.fit(X, y)
print(model)  # Display slope and intercept

predictions = model.predict([6, 7, 8])
print(f"Predictions: {predictions}")

r2 = model.score(X, y)
print(f"R² = {r2:.3f}")

✅ Reference Solution

python

import numpy as np

class SimpleLinearRegression:
    """Simple linear regression (y = slope * x + intercept)"""

    def __init__(self):
        self.slope = None
        self.intercept = None
        self.is_fitted = False

    def fit(self, X, y):
        """Fit model

        Parameters:
            X: Independent variable (1D array)
            y: Dependent variable (1D array)

        Returns:
            self (supports method chaining)
        """
        X = np.array(X)
        y = np.array(y)

        if len(X) != len(y):
            raise ValueError("X and y must have the same length")

        # Calculate slope and intercept
        x_mean = X.mean()
        y_mean = y.mean()

        # slope = Σ((x - x̄)(y - ȳ)) / Σ((x - x̄)²)
        numerator = ((X - x_mean) * (y - y_mean)).sum()
        denominator = ((X - x_mean) ** 2).sum()

        if denominator == 0:
            raise ValueError("X has zero variance, cannot fit")

        self.slope = numerator / denominator
        self.intercept = y_mean - self.slope * x_mean
        self.is_fitted = True

        return self  # Support method chaining

    def predict(self, X):
        """Predict

        Parameters:
            X: Independent variable

        Returns:
            Array of predictions
        """
        if not self.is_fitted:
            raise ValueError("Model not trained, please call fit() first")

        X = np.array(X)
        return self.slope * X + self.intercept

    def score(self, X, y):
        """Calculate R² (coefficient of determination)

        R² = 1 - (SS_res / SS_tot)

        Parameters:
            X: Independent variable
            y: True values

        Returns:
            R² value (0-1, closer to 1 is better)
        """
        y = np.array(y)
        y_pred = self.predict(X)

        # Residual sum of squares
        ss_res = ((y - y_pred) ** 2).sum()

        # Total sum of squares
        ss_tot = ((y - y.mean()) ** 2).sum()

        if ss_tot == 0:
            return 0.0

        return 1 - (ss_res / ss_tot)

    def get_residuals(self, X, y):
        """Calculate residuals"""
        y_pred = self.predict(X)
        return np.array(y) - y_pred

    def summary(self):
        """Print model summary"""
        if not self.is_fitted:
            print("Model not trained")
            return

        print("=" * 50)
        print("Simple Linear Regression Model Summary")
        print("=" * 50)
        print(f"Slope:     {self.slope:.4f}")
        print(f"Intercept: {self.intercept:.4f}")
        print(f"Equation: y = {self.slope:.4f}x + {self.intercept:.4f}")
        print("=" * 50)

    def __repr__(self):
        if not self.is_fitted:
            return "SimpleLinearRegression(unfitted)"
        return f"SimpleLinearRegression(slope={self.slope:.4f}, intercept={self.intercept:.4f})"

    def __str__(self):
        if not self.is_fitted:
            return "Untrained model"
        return f"y = {self.slope:.4f}x + {self.intercept:.4f}"


# Test
print("=" * 60)
print("Simple Linear Regression Test")
print("=" * 60)

# Data 1: Perfect linear relationship
print("\nTest 1: Perfect linear relationship (y = 2x)")
X1 = [1, 2, 3, 4, 5]
y1 = [2, 4, 6, 8, 10]

model1 = SimpleLinearRegression()
model1.fit(X1, y1)
print(model1)
model1.summary()

predictions1 = model1.predict([6, 7, 8])
print(f"Predictions for x=[6,7,8]: {predictions1}")
print(f"R² = {model1.score(X1, y1):.4f}")

# Data 2: Linear relationship with noise
print("\nTest 2: Linear relationship with noise")
X2 = [1, 2, 3, 4, 5]
y2 = [2, 4, 5, 4, 5]

model2 = SimpleLinearRegression()
model2.fit(X2, y2)
print(model2)

predictions2 = model2.predict([6, 7, 8])
print(f"Predictions for x=[6,7,8]: {predictions2}")
print(f"R² = {model2.score(X2, y2):.4f}")

# Residual analysis
residuals = model2.get_residuals(X2, y2)
print(f"Residuals: {residuals}")

# Data 3: Income and years of education
print("\nTest 3: Income vs Years of Education")
education_years = [12, 14, 16, 18, 20]  # Years of education
income = [35000, 45000, 60000, 75000, 90000]  # Income

model3 = SimpleLinearRegression()
model3.fit(education_years, income)
model3.summary()

# Predict: Bachelor's (16 years) and Master's (18 years)
predictions3 = model3.predict([16, 18, 20])
print(f"\nPredicted income:")
print(f"  Bachelor's (16 years): ${predictions3[0]:,.0f}")
print(f"  Master's (18 years): ${predictions3[1]:,.0f}")
print(f"  PhD (20 years): ${predictions3[2]:,.0f}")
print(f"\nR² = {model3.score(education_years, income):.4f}")

print("\n" + "=" * 60)

Next Steps

After completing this chapter, you have mastered:

OOP core concepts (class, object, method, attribute)
Special methods (__init__, __str__, __len__, etc.)
Encapsulation (public/private)
OOP applications in data science

Congratulations on completing Module 6!

In Module 7, we'll learn file operations, including reading and writing CSV, Excel, Stata, and other data files.

Module 6 Summary and Review

Knowledge Summary

1. OOP Core Concepts

2. Basic Class Structure

3. Instance Attributes vs Class Attributes

4. Special Methods (Magic Methods)

5. Encapsulation: Public vs Private

6. OOP in Data Science Applications

Python vs Stata vs R

Object-Oriented Comparison

Common Errors

1. Forgetting the self Parameter

2. Confusing Instance and Class Attributes

4. Forgetting to Implement `str` Leads to Unfriendly Output

Best Practices

1. Use CapWords Naming for Classes

2. Use snake_case Naming for Methods

3. Use Docstrings

4. Support Method Chaining

Programming Exercises

Exercise 1: Student Grade Management System (Basic)

Exercise 2: Survey Data Container (Basic)

Exercise 3: Data Analysis Pipeline (Intermediate)

Exercise 4: Simple Linear Regression Class (Advanced)

Next Steps

Further Reading

Quick Links

Module 6 Summary and Review ​

Knowledge Summary ​

1. OOP Core Concepts ​

2. Basic Class Structure ​

3. Instance Attributes vs Class Attributes ​

4. Special Methods (Magic Methods) ​

5. Encapsulation: Public vs Private ​

6. OOP in Data Science Applications ​

Python vs Stata vs R ​

Object-Oriented Comparison ​

Common Errors ​

1. Forgetting the self Parameter ​

2. Confusing Instance and Class Attributes ​

3. Directly Modifying Class Attributes Causes Unexpected Sharing ​

4. Forgetting to Implement __str__ Leads to Unfriendly Output ​

Best Practices ​

1. Use CapWords Naming for Classes ​

2. Use snake_case Naming for Methods ​

3. Use Docstrings ​

4. Support Method Chaining ​

Programming Exercises ​

Exercise 1: Student Grade Management System (Basic) ​

Exercise 2: Survey Data Container (Basic) ​

Exercise 3: Data Analysis Pipeline (Intermediate) ​

Exercise 4: Simple Linear Regression Class (Advanced) ​

Next Steps ​

Further Reading ​

Quick Links ​

Module 6 Summary and Review

Knowledge Summary

1. OOP Core Concepts

2. Basic Class Structure

3. Instance Attributes vs Class Attributes

4. Special Methods (Magic Methods)

5. Encapsulation: Public vs Private

6. OOP in Data Science Applications

Python vs Stata vs R

Object-Oriented Comparison

Common Errors

1. Forgetting the self Parameter

2. Confusing Instance and Class Attributes

3. Directly Modifying Class Attributes Causes Unexpected Sharing

4. Forgetting to Implement `str` Leads to Unfriendly Output

Best Practices

1. Use CapWords Naming for Classes

2. Use snake_case Naming for Methods

3. Use Docstrings

4. Support Method Chaining

Programming Exercises

Exercise 1: Student Grade Management System (Basic)

Exercise 2: Survey Data Container (Basic)

Exercise 3: Data Analysis Pipeline (Intermediate)

Exercise 4: Simple Linear Regression Class (Advanced)

Next Steps

Further Reading

Quick Links