Introduction to Object-Oriented Programming
Understanding the Secret of
df.method()— Why Data Science Needs OOP
What is Object-Oriented Programming (OOP)?
Object-Oriented Programming (OOP) is a programming paradigm that organizes data and the methods that operate on that data together.
Why should social science students learn OOP?
You're already using OOP!
python
import pandas as pd
df = pd.DataFrame({'age': [25, 30, 35]})
result = df.mean() # df is an object, mean() is a methodCore Concepts:
- Object: A collection of data + methods
- Class: A template/blueprint for objects
- Method: A function belonging to an object
Why Do We Need OOP?
Scenario: Managing Respondent Information
Traditional Approach (Dictionary):
python
respondent1 = {
'id': 1001,
'name': 'Alice',
'age': 30,
'income': 75000
}
# Calculate net income (function separated)
def calculate_net_income(respondent, tax_rate=0.25):
return respondent['income'] * (1 - tax_rate)
net = calculate_net_income(respondent1)OOP Approach (Class):
python
class Respondent:
def __init__(self, id, name, age, income):
self.id = id
self.name = name
self.age = age
self.income = income
def calculate_net_income(self, tax_rate=0.25):
"""Data and methods together"""
return self.income * (1 - tax_rate)
# Usage
resp1 = Respondent(1001, 'Alice', 30, 75000)
net = resp1.calculate_net_income() # More intuitive!Advantages:
- Data and methods bound together
- Code is more organized
- Easier to reuse and maintain
Understanding Pandas' OOP
python
import pandas as pd
# DataFrame is a class
df = pd.DataFrame({'x': [1, 2, 3]}) # df is an instance (object) of DataFrame
# These are all methods (functions of the object)
df.head() # View first few rows
df.describe() # Descriptive statistics
df.mean() # Mean
df.to_csv('out.csv') # Save
# These are attributes (data of the object)
df.shape # (3, 1)
df.columns # Index(['x'])
df.dtypes # Data typesCompared to R:
r
# R (more functional)
df <- data.frame(x = c(1, 2, 3))
head(df) # Function call
mean(df$x) # Function call
write.csv(df, 'out.csv')python
# Python (more object-oriented)
df = pd.DataFrame({'x': [1, 2, 3]})
df.head() # Method call
df['x'].mean() # Method call
df.to_csv('out.csv')Hands-On: Creating a Simple Class
Example 1: Student Class
python
class Student:
"""Student class"""
def __init__(self, name, age, major):
"""Constructor: automatically called when creating an object"""
self.name = name
self.age = age
self.major = major
self.courses = []
def enroll(self, course):
"""Add a course"""
self.courses.append(course)
print(f"{self.name} enrolled in {course}")
def display_info(self):
"""Display information"""
print(f"Name: {self.name}")
print(f"Age: {self.age}")
print(f"Major: {self.major}")
print(f"Courses: {', '.join(self.courses)}")
# Usage
alice = Student("Alice", 25, "Economics")
alice.enroll("Microeconomics")
alice.enroll("Econometrics")
alice.display_info()Example 2: Survey Response Class
python
class SurveyResponse:
"""Survey response class"""
def __init__(self, respondent_id, age, gender, income):
self.respondent_id = respondent_id
self.age = age
self.gender = gender
self.income = income
def is_valid(self):
"""Validate data"""
return (
18 <= self.age <= 100 and
self.income >= 0 and
self.gender in ['Male', 'Female', 'Other']
)
def income_category(self):
"""Income classification"""
if self.income < 50000:
return "Low Income"
elif self.income < 100000:
return "Middle Income"
else:
return "High Income"
def summary(self):
"""Generate summary"""
return f"ID{self.respondent_id}: {self.gender}, {self.age} years old, {self.income_category()}"
# Usage
resp1 = SurveyResponse(1001, 30, 'Female', 75000)
print(resp1.is_valid()) # True
print(resp1.income_category()) # Middle Income
print(resp1.summary()) # ID1001: Female, 30 years old, Middle IncomeOOP Core Concepts
1. Classes and Objects
python
# Class is a template
class Dog:
def bark(self):
return "Woof!"
# Object is an instance
dog1 = Dog() # Create an object
dog2 = Dog() # Create another object
print(dog1.bark()) # Woof!
print(dog2.bark()) # Woof!2. Attributes and Methods
python
class Respondent:
def __init__(self, age, income):
# Attributes (data)
self.age = age
self.income = income
# Method (function)
def is_high_earner(self):
return self.income > 100000
resp = Respondent(30, 120000)
print(resp.age) # Attribute access
print(resp.is_high_earner()) # Method call3. The self Parameter
python
class Person:
def __init__(self, name):
self.name = name # self refers to the current object
def greet(self):
# self allows the method to access the object's attributes
return f"Hello, I'm {self.name}"
p1 = Person("Alice")
p2 = Person("Bob")
print(p1.greet()) # Hello, I'm Alice
print(p2.greet()) # Hello, I'm BobObject-Oriented vs Functional
Functional Style
python
def create_respondent(id, age, income):
return {'id': id, 'age': age, 'income': income}
def calculate_tax(respondent, rate):
return respondent['income'] * rate
def is_valid(respondent):
return respondent['age'] >= 18
resp = create_respondent(1001, 30, 75000)
tax = calculate_tax(resp, 0.25)
valid = is_valid(resp)Object-Oriented Style
python
class Respondent:
def __init__(self, id, age, income):
self.id = id
self.age = age
self.income = income
def calculate_tax(self, rate=0.25):
return self.income * rate
def is_valid(self):
return self.age >= 18
resp = Respondent(1001, 30, 75000)
tax = resp.calculate_tax()
valid = resp.is_valid()When to use which?
- Functional: Simple scripts, data analysis
- OOP: Large projects, reusable components
OOP for Social Science Students
You don't need to master OOP deeply, but understand:
- Pandas DataFrame is an object
python
df.head() # Call a method
df.shape # Access an attribute- Scikit-learn models are objects
python
from sklearn.linear_model import LinearRegression
model = LinearRegression() # Create an object
model.fit(X, y) # Call a method
predictions = model.predict(X_test)- Statsmodels also uses OOP
python
import statsmodels.formula.api as smf
model = smf.ols('income ~ education + age', data=df)
results = model.fit()
print(results.summary())Practice Problems
Exercise 1: Create a Course Class
python
# Create a Course class
# Attributes: name, credits, students (list)
# Methods:
# - add_student(student_name)
# - get_enrollment() # Return number of students
# - display_roster() # Display all students
course = Course("Econometrics", 4)
course.add_student("Alice")
course.add_student("Bob")
print(course.get_enrollment()) # 2
course.display_roster()Exercise 2: Convert to Class
python
# Convert the following functional code to OOP style
def create_survey(name, year):
return {'name': name, 'year': year, 'responses': []}
def add_response(survey, response):
survey['responses'].append(response)
def get_response_count(survey):
return len(survey['responses'])
# Convert to Survey classNext Steps
In the next section, we'll learn about detailed usage of classes and objects.
Keep going!