Module 1: Python Introduction & Navigation
Why Choose Python? — The Programming Starting Point for Social Science Researchers
Module Overview
Before beginning your Python journey, the most important question is: Why learn Python? This module will help you understand Python's unique value from a social science researcher's perspective, compare it with Stata/R, and take your first steps in programming.
Learning Objectives
After completing this module, you will be able to:
- Understand Python's advantages and application scenarios in social science research
- Clearly identify the similarities, differences, and complementary relationships between Python, Stata, and R
- Know when to use Python versus Stata/R
- Successfully run your first Python program
- Understand basic concepts of variables, functions, and data analysis
- Build confidence and adopt the right mindset for learning Python
Module Contents
01 - Why Learn Python?
Core Question: What can Python bring to social science research?
Core Content:
- Python's rise in academia (top journal acceptance, citation trends)
- Five core advantages: free & open-source, rich ecosystem, versatility, active community, career value
- Real research cases:
- Text analysis (Twitter sentiment analysis, transformers models)
- Causal inference (Double Machine Learning)
- Event studies (stock abnormal returns)
- Network analysis (social network centrality)
- Python's limitations (honest discussion)
- Academic journal policies (AER, QJE, Econometrica)
Why It Matters:
- Build learning motivation: understand Python's unique value
- Set reasonable expectations: know what can and cannot be done
- Career development: master skills needed in both academia and industry
02 - Python vs Stata/R
Core Question: I already know Stata/R, why should I learn Python?
Core Content:
- Basic syntax comparison (variable creation, loops, conditionals)
- Data processing comparison (Pandas vs data.table vs Stata)
- Advanced operations comparison:
- Panel data regression (fixed effects)
- Categorical variable handling
- Time series operations
- String processing
- Complete workflow comparison for Difference-in-Differences (DID)
- Data visualization comparison (coefplot implementation in three languages)
- Performance benchmarking (processing 1 million rows)
- Ecosystem comparison (package management, community, learning resources)
- Mixed-use strategies (when to use which tool)
Three Tools' Positioning:
- Stata: Econometrics standard, robustness checks, policy evaluation
- R: Statistical modeling, academic plotting (ggplot2), biostatistics
- Python: Big data, machine learning, text analysis, full-stack development
03 - Your First Python Program
Core Question: How do I start coding?
Core Content:
- Hello World program
- Basic concepts: variables, data types, operators
- Simple calculation examples (education returns)
- Complete case: Income inequality analysis
- Generate 5,000 samples
- Descriptive statistics (mean, median, standard deviation)
- Gini coefficient calculation
- OLS regression analysis
- Data visualization (income distribution histogram)
- Production-grade code vs beginner code comparison
- Practice exercises (with complete answers)
Learning Methods:
- Hands-on practice: run every example
- Modify parameters: understand code logic
- Review outputs: build intuition
- Progressive learning: from simple to complex
Quick Comparison of Three Languages
| Dimension | Python | Stata | R |
|---|---|---|---|
| Learning Curve | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| Econometrics | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| Machine Learning | ⭐⭐⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐⭐ |
| Text Analysis | ⭐⭐⭐⭐⭐ | ⭐ | ⭐⭐⭐ |
| Big Data Processing | ⭐⭐⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐ |
| Data Visualization | ⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| Free & Open Source | |||
| Community Support | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ |
How to Choose?
By Research Task
| Task | Preferred Tool | Reason |
|---|---|---|
| OLS/Logit/Probit | Stata/Python | Stata is concise, Python is flexible |
| Panel Data/Fixed Effects | Stata | Most mature implementation |
| DID/RDD/PSM | Stata/R | Standardized workflow |
| Machine Learning Prediction | Python | Richest library ecosystem |
| Text Mining/NLP | Python | transformers, spaCy |
| Network Analysis | Python/R | NetworkX, igraph |
| Data Visualization | R/Python | ggplot2 strongest, Python second |
| Big Data (>10GB) | Python | Dask, PySpark |
By Learning Stage
Weeks 1-2 (Getting Started):
- Read all 3 articles in this module
- Run all code examples
- ️ Complete practice exercises
Weeks 3-4 (Building Confidence):
- Replicate a Stata analysis in Python
- Try simple regression in Python
- Compare Python and Stata outputs
Week 5 and Beyond (Deep Learning):
- Learn syntax and data processing in Modules 3-11
- Selectively learn advanced topics based on research needs
- Use Python in actual research
Module Study Recommendations
Minimalist Learning Path
If time is limited, prioritize in this order:
Must-Learn (1-2 hours):
- 01 - Why Learn Python (reading)
- 03 - Your First Python Program (hands-on practice)
Important (2-3 hours):
- 02 - Python vs Stata/R (focus on comparison tables)
- 03 - Complete case: Income inequality analysis
Enhancement (optional reading):
- 01 - Real research cases section
- 02 - Mixed-use strategies
Common Questions
Q: I have no programming experience at all, can I learn it? A: Yes! This course assumes zero background and starts from the most basic concepts. With daily practice of 1 hour, you can get started in 2 weeks.
Q: Does learning Python mean giving up Stata/R? A: No! The three tools have their own strengths and should complement each other. Initially you can go "Python primary, Stata/R supplementary", and gradually find the workflow that suits you best.
Q: My research mainly uses Stata, is Python necessary? A: It depends on your research needs. If you only do traditional econometrics (OLS/fixed effects), Stata is sufficient. But if you work with text analysis, machine learning, or big data, Python is essential.
Q: How long before I can use Python for research? A: Basic syntax (2 weeks) → Data processing (2 weeks) → Regression analysis (1 week) → Can do simple research (5 weeks). Deep mastery requires 3-6 months.
Next Steps
After completing this module, you will:
- Have clear motivation for learning Python
- Understand Python's position in the research workflow
- Successfully run your first data analysis program
- Be well-prepared for deeper learning
In Module 2, we will configure development environments (Jupyter, VS Code, Colab) and choose the programming tool that suits you best.
In Module 3, we will systematically learn Python syntax, from variables and data types to loops and conditionals.
Ready? Let's begin this exciting programming journey!