Module 1: Python Introduction & Navigation

Why Choose Python? — The Programming Starting Point for Social Science Researchers

Module Overview

Before beginning your Python journey, the most important question is: Why learn Python? This module will help you understand Python's unique value from a social science researcher's perspective, compare it with Stata/R, and take your first steps in programming.

Learning Objectives

After completing this module, you will be able to:

Understand Python's advantages and application scenarios in social science research
Clearly identify the similarities, differences, and complementary relationships between Python, Stata, and R
Know when to use Python versus Stata/R
Successfully run your first Python program
Understand basic concepts of variables, functions, and data analysis
Build confidence and adopt the right mindset for learning Python

Module Contents

01 - Why Learn Python?

Core Question: What can Python bring to social science research?

Core Content:

Python's rise in academia (top journal acceptance, citation trends)
Five core advantages: free & open-source, rich ecosystem, versatility, active community, career value
Real research cases:
- Text analysis (Twitter sentiment analysis, transformers models)
- Causal inference (Double Machine Learning)
- Event studies (stock abnormal returns)
- Network analysis (social network centrality)
Python's limitations (honest discussion)
Academic journal policies (AER, QJE, Econometrica)

Why It Matters:

Build learning motivation: understand Python's unique value
Set reasonable expectations: know what can and cannot be done
Career development: master skills needed in both academia and industry

02 - Python vs Stata/R

Core Question: I already know Stata/R, why should I learn Python?

Core Content:

Basic syntax comparison (variable creation, loops, conditionals)
Data processing comparison (Pandas vs data.table vs Stata)
Advanced operations comparison:
- Panel data regression (fixed effects)
- Categorical variable handling
- Time series operations
- String processing
Complete workflow comparison for Difference-in-Differences (DID)
Data visualization comparison (coefplot implementation in three languages)
Performance benchmarking (processing 1 million rows)
Ecosystem comparison (package management, community, learning resources)
Mixed-use strategies (when to use which tool)

Three Tools' Positioning:

Stata: Econometrics standard, robustness checks, policy evaluation
R: Statistical modeling, academic plotting (ggplot2), biostatistics
Python: Big data, machine learning, text analysis, full-stack development

03 - Your First Python Program

Core Question: How do I start coding?

Core Content:

Hello World program
Basic concepts: variables, data types, operators
Simple calculation examples (education returns)
Complete case: Income inequality analysis
- Generate 5,000 samples
- Descriptive statistics (mean, median, standard deviation)
- Gini coefficient calculation
- OLS regression analysis
- Data visualization (income distribution histogram)
Production-grade code vs beginner code comparison
Practice exercises (with complete answers)

Learning Methods:

Hands-on practice: run every example
Modify parameters: understand code logic
Review outputs: build intuition
Progressive learning: from simple to complex

Quick Comparison of Three Languages

Dimension	Python	Stata	R
Learning Curve	⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐
Econometrics	⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐
Machine Learning	⭐⭐⭐⭐⭐	⭐⭐	⭐⭐⭐⭐
Text Analysis	⭐⭐⭐⭐⭐	⭐	⭐⭐⭐
Big Data Processing	⭐⭐⭐⭐⭐	⭐⭐	⭐⭐⭐
Data Visualization	⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐⭐
Free & Open Source
Community Support	⭐⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐

How to Choose?

By Research Task

Task	Preferred Tool	Reason
OLS/Logit/Probit	Stata/Python	Stata is concise, Python is flexible
Panel Data/Fixed Effects	Stata	Most mature implementation
DID/RDD/PSM	Stata/R	Standardized workflow
Machine Learning Prediction	Python	Richest library ecosystem
Text Mining/NLP	Python	transformers, spaCy
Network Analysis	Python/R	NetworkX, igraph
Data Visualization	R/Python	ggplot2 strongest, Python second
Big Data (>10GB)	Python	Dask, PySpark

By Learning Stage

Weeks 1-2 (Getting Started):

Read all 3 articles in this module
Run all code examples
️ Complete practice exercises

Weeks 3-4 (Building Confidence):

Replicate a Stata analysis in Python
Try simple regression in Python
Compare Python and Stata outputs

Week 5 and Beyond (Deep Learning):

Learn syntax and data processing in Modules 3-11
Selectively learn advanced topics based on research needs
Use Python in actual research

Module Study Recommendations

Minimalist Learning Path

If time is limited, prioritize in this order:

Must-Learn (1-2 hours):

01 - Why Learn Python (reading)
03 - Your First Python Program (hands-on practice)

Important (2-3 hours):

02 - Python vs Stata/R (focus on comparison tables)
03 - Complete case: Income inequality analysis

Enhancement (optional reading):

01 - Real research cases section
02 - Mixed-use strategies

Common Questions

Q: I have no programming experience at all, can I learn it? A: Yes! This course assumes zero background and starts from the most basic concepts. With daily practice of 1 hour, you can get started in 2 weeks.

Q: Does learning Python mean giving up Stata/R? A: No! The three tools have their own strengths and should complement each other. Initially you can go "Python primary, Stata/R supplementary", and gradually find the workflow that suits you best.

Q: My research mainly uses Stata, is Python necessary? A: It depends on your research needs. If you only do traditional econometrics (OLS/fixed effects), Stata is sufficient. But if you work with text analysis, machine learning, or big data, Python is essential.

Q: How long before I can use Python for research? A: Basic syntax (2 weeks) → Data processing (2 weeks) → Regression analysis (1 week) → Can do simple research (5 weeks). Deep mastery requires 3-6 months.

Next Steps

After completing this module, you will:

Have clear motivation for learning Python
Understand Python's position in the research workflow
Successfully run your first data analysis program
Be well-prepared for deeper learning

In Module 2, we will configure development environments (Jupyter, VS Code, Colab) and choose the programming tool that suits you best.

In Module 3, we will systematically learn Python syntax, from variables and data types to loops and conditionals.

Ready? Let's begin this exciting programming journey!

Module 1: Python Introduction & Navigation ​

Module Overview ​

Learning Objectives ​

Module Contents ​

01 - Why Learn Python? ​

02 - Python vs Stata/R ​

03 - Your First Python Program ​

Quick Comparison of Three Languages ​

How to Choose? ​

By Research Task ​

By Learning Stage ​

Module Study Recommendations ​

Minimalist Learning Path ​

Common Questions ​

Next Steps ​

Quick Links ​

Module 1: Python Introduction & Navigation

Module Overview

Learning Objectives

Module Contents

01 - Why Learn Python?

02 - Python vs Stata/R

03 - Your First Python Program

Quick Comparison of Three Languages

How to Choose?

By Research Task

By Learning Stage

Module Study Recommendations

Minimalist Learning Path

Common Questions

Next Steps

Quick Links