Skip to content

Introduction to Object-Oriented Programming

Understanding the Secret of df.method() — Why Data Science Needs OOP


What is Object-Oriented Programming (OOP)?

Object-Oriented Programming (OOP) is a programming paradigm that organizes data and the methods that operate on that data together.

Why should social science students learn OOP?

You're already using OOP!

python
import pandas as pd

df = pd.DataFrame({'age': [25, 30, 35]})
result = df.mean()  # df is an object, mean() is a method

Core Concepts:

  • Object: A collection of data + methods
  • Class: A template/blueprint for objects
  • Method: A function belonging to an object

Why Do We Need OOP?

Scenario: Managing Respondent Information

Traditional Approach (Dictionary):

python
respondent1 = {
    'id': 1001,
    'name': 'Alice',
    'age': 30,
    'income': 75000
}

# Calculate net income (function separated)
def calculate_net_income(respondent, tax_rate=0.25):
    return respondent['income'] * (1 - tax_rate)

net = calculate_net_income(respondent1)

OOP Approach (Class):

python
class Respondent:
    def __init__(self, id, name, age, income):
        self.id = id
        self.name = name
        self.age = age
        self.income = income

    def calculate_net_income(self, tax_rate=0.25):
        """Data and methods together"""
        return self.income * (1 - tax_rate)

# Usage
resp1 = Respondent(1001, 'Alice', 30, 75000)
net = resp1.calculate_net_income()  # More intuitive!

Advantages:

  • Data and methods bound together
  • Code is more organized
  • Easier to reuse and maintain

Understanding Pandas' OOP

python
import pandas as pd

# DataFrame is a class
df = pd.DataFrame({'x': [1, 2, 3]})  # df is an instance (object) of DataFrame

# These are all methods (functions of the object)
df.head()          # View first few rows
df.describe()      # Descriptive statistics
df.mean()          # Mean
df.to_csv('out.csv')  # Save

# These are attributes (data of the object)
df.shape           # (3, 1)
df.columns         # Index(['x'])
df.dtypes          # Data types

Compared to R:

r
# R (more functional)
df <- data.frame(x = c(1, 2, 3))
head(df)           # Function call
mean(df$x)         # Function call
write.csv(df, 'out.csv')
python
# Python (more object-oriented)
df = pd.DataFrame({'x': [1, 2, 3]})
df.head()          # Method call
df['x'].mean()     # Method call
df.to_csv('out.csv')

Hands-On: Creating a Simple Class

Example 1: Student Class

python
class Student:
    """Student class"""

    def __init__(self, name, age, major):
        """Constructor: automatically called when creating an object"""
        self.name = name
        self.age = age
        self.major = major
        self.courses = []

    def enroll(self, course):
        """Add a course"""
        self.courses.append(course)
        print(f"{self.name} enrolled in {course}")

    def display_info(self):
        """Display information"""
        print(f"Name: {self.name}")
        print(f"Age: {self.age}")
        print(f"Major: {self.major}")
        print(f"Courses: {', '.join(self.courses)}")

# Usage
alice = Student("Alice", 25, "Economics")
alice.enroll("Microeconomics")
alice.enroll("Econometrics")
alice.display_info()

Example 2: Survey Response Class

python
class SurveyResponse:
    """Survey response class"""

    def __init__(self, respondent_id, age, gender, income):
        self.respondent_id = respondent_id
        self.age = age
        self.gender = gender
        self.income = income

    def is_valid(self):
        """Validate data"""
        return (
            18 <= self.age <= 100 and
            self.income >= 0 and
            self.gender in ['Male', 'Female', 'Other']
        )

    def income_category(self):
        """Income classification"""
        if self.income < 50000:
            return "Low Income"
        elif self.income < 100000:
            return "Middle Income"
        else:
            return "High Income"

    def summary(self):
        """Generate summary"""
        return f"ID{self.respondent_id}: {self.gender}, {self.age} years old, {self.income_category()}"

# Usage
resp1 = SurveyResponse(1001, 30, 'Female', 75000)
print(resp1.is_valid())         # True
print(resp1.income_category())  # Middle Income
print(resp1.summary())          # ID1001: Female, 30 years old, Middle Income

OOP Core Concepts

1. Classes and Objects

python
# Class is a template
class Dog:
    def bark(self):
        return "Woof!"

# Object is an instance
dog1 = Dog()  # Create an object
dog2 = Dog()  # Create another object

print(dog1.bark())  # Woof!
print(dog2.bark())  # Woof!

2. Attributes and Methods

python
class Respondent:
    def __init__(self, age, income):
        # Attributes (data)
        self.age = age
        self.income = income

    # Method (function)
    def is_high_earner(self):
        return self.income > 100000

resp = Respondent(30, 120000)
print(resp.age)              # Attribute access
print(resp.is_high_earner()) # Method call

3. The self Parameter

python
class Person:
    def __init__(self, name):
        self.name = name  # self refers to the current object

    def greet(self):
        # self allows the method to access the object's attributes
        return f"Hello, I'm {self.name}"

p1 = Person("Alice")
p2 = Person("Bob")

print(p1.greet())  # Hello, I'm Alice
print(p2.greet())  # Hello, I'm Bob

Object-Oriented vs Functional

Functional Style

python
def create_respondent(id, age, income):
    return {'id': id, 'age': age, 'income': income}

def calculate_tax(respondent, rate):
    return respondent['income'] * rate

def is_valid(respondent):
    return respondent['age'] >= 18

resp = create_respondent(1001, 30, 75000)
tax = calculate_tax(resp, 0.25)
valid = is_valid(resp)

Object-Oriented Style

python
class Respondent:
    def __init__(self, id, age, income):
        self.id = id
        self.age = age
        self.income = income

    def calculate_tax(self, rate=0.25):
        return self.income * rate

    def is_valid(self):
        return self.age >= 18

resp = Respondent(1001, 30, 75000)
tax = resp.calculate_tax()
valid = resp.is_valid()

When to use which?

  • Functional: Simple scripts, data analysis
  • OOP: Large projects, reusable components

OOP for Social Science Students

You don't need to master OOP deeply, but understand:

  1. Pandas DataFrame is an object
python
df.head()        # Call a method
df.shape         # Access an attribute
  1. Scikit-learn models are objects
python
from sklearn.linear_model import LinearRegression

model = LinearRegression()  # Create an object
model.fit(X, y)             # Call a method
predictions = model.predict(X_test)
  1. Statsmodels also uses OOP
python
import statsmodels.formula.api as smf

model = smf.ols('income ~ education + age', data=df)
results = model.fit()
print(results.summary())

Practice Problems

Exercise 1: Create a Course Class

python
# Create a Course class
# Attributes: name, credits, students (list)
# Methods:
#   - add_student(student_name)
#   - get_enrollment()  # Return number of students
#   - display_roster()  # Display all students

course = Course("Econometrics", 4)
course.add_student("Alice")
course.add_student("Bob")
print(course.get_enrollment())  # 2
course.display_roster()

Exercise 2: Convert to Class

python
# Convert the following functional code to OOP style

def create_survey(name, year):
    return {'name': name, 'year': year, 'responses': []}

def add_response(survey, response):
    survey['responses'].append(response)

def get_response_count(survey):
    return len(survey['responses'])

# Convert to Survey class

Next Steps

In the next section, we'll learn about detailed usage of classes and objects.

Keep going!


Released under the MIT License. Content © Author.