Skip to content

Module 1: Python Introduction & Navigation

Why Choose Python? — The Programming Starting Point for Social Science Researchers


Module Overview

Before beginning your Python journey, the most important question is: Why learn Python? This module will help you understand Python's unique value from a social science researcher's perspective, compare it with Stata/R, and take your first steps in programming.


Learning Objectives

After completing this module, you will be able to:

  • Understand Python's advantages and application scenarios in social science research
  • Clearly identify the similarities, differences, and complementary relationships between Python, Stata, and R
  • Know when to use Python versus Stata/R
  • Successfully run your first Python program
  • Understand basic concepts of variables, functions, and data analysis
  • Build confidence and adopt the right mindset for learning Python

Module Contents

01 - Why Learn Python?

Core Question: What can Python bring to social science research?

Core Content:

  • Python's rise in academia (top journal acceptance, citation trends)
  • Five core advantages: free & open-source, rich ecosystem, versatility, active community, career value
  • Real research cases:
    • Text analysis (Twitter sentiment analysis, transformers models)
    • Causal inference (Double Machine Learning)
    • Event studies (stock abnormal returns)
    • Network analysis (social network centrality)
  • Python's limitations (honest discussion)
  • Academic journal policies (AER, QJE, Econometrica)

Why It Matters:

  • Build learning motivation: understand Python's unique value
  • Set reasonable expectations: know what can and cannot be done
  • Career development: master skills needed in both academia and industry

02 - Python vs Stata/R

Core Question: I already know Stata/R, why should I learn Python?

Core Content:

  • Basic syntax comparison (variable creation, loops, conditionals)
  • Data processing comparison (Pandas vs data.table vs Stata)
  • Advanced operations comparison:
    • Panel data regression (fixed effects)
    • Categorical variable handling
    • Time series operations
    • String processing
  • Complete workflow comparison for Difference-in-Differences (DID)
  • Data visualization comparison (coefplot implementation in three languages)
  • Performance benchmarking (processing 1 million rows)
  • Ecosystem comparison (package management, community, learning resources)
  • Mixed-use strategies (when to use which tool)

Three Tools' Positioning:

  • Stata: Econometrics standard, robustness checks, policy evaluation
  • R: Statistical modeling, academic plotting (ggplot2), biostatistics
  • Python: Big data, machine learning, text analysis, full-stack development

03 - Your First Python Program

Core Question: How do I start coding?

Core Content:

  • Hello World program
  • Basic concepts: variables, data types, operators
  • Simple calculation examples (education returns)
  • Complete case: Income inequality analysis
    • Generate 5,000 samples
    • Descriptive statistics (mean, median, standard deviation)
    • Gini coefficient calculation
    • OLS regression analysis
    • Data visualization (income distribution histogram)
  • Production-grade code vs beginner code comparison
  • Practice exercises (with complete answers)

Learning Methods:

  • Hands-on practice: run every example
  • Modify parameters: understand code logic
  • Review outputs: build intuition
  • Progressive learning: from simple to complex

Quick Comparison of Three Languages

DimensionPythonStataR
Learning Curve⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Econometrics⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Machine Learning⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Text Analysis⭐⭐⭐⭐⭐⭐⭐⭐
Big Data Processing⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Data Visualization⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Free & Open Source
Community Support⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐

How to Choose?

By Research Task

TaskPreferred ToolReason
OLS/Logit/ProbitStata/PythonStata is concise, Python is flexible
Panel Data/Fixed EffectsStataMost mature implementation
DID/RDD/PSMStata/RStandardized workflow
Machine Learning PredictionPythonRichest library ecosystem
Text Mining/NLPPythontransformers, spaCy
Network AnalysisPython/RNetworkX, igraph
Data VisualizationR/Pythonggplot2 strongest, Python second
Big Data (>10GB)PythonDask, PySpark

By Learning Stage

Weeks 1-2 (Getting Started):

  • Read all 3 articles in this module
  • Run all code examples
  • ️ Complete practice exercises

Weeks 3-4 (Building Confidence):

  • Replicate a Stata analysis in Python
  • Try simple regression in Python
  • Compare Python and Stata outputs

Week 5 and Beyond (Deep Learning):

  • Learn syntax and data processing in Modules 3-11
  • Selectively learn advanced topics based on research needs
  • Use Python in actual research

Module Study Recommendations

Minimalist Learning Path

If time is limited, prioritize in this order:

Must-Learn (1-2 hours):

  • 01 - Why Learn Python (reading)
  • 03 - Your First Python Program (hands-on practice)

Important (2-3 hours):

  • 02 - Python vs Stata/R (focus on comparison tables)
  • 03 - Complete case: Income inequality analysis

Enhancement (optional reading):

  • 01 - Real research cases section
  • 02 - Mixed-use strategies

Common Questions

Q: I have no programming experience at all, can I learn it? A: Yes! This course assumes zero background and starts from the most basic concepts. With daily practice of 1 hour, you can get started in 2 weeks.

Q: Does learning Python mean giving up Stata/R? A: No! The three tools have their own strengths and should complement each other. Initially you can go "Python primary, Stata/R supplementary", and gradually find the workflow that suits you best.

Q: My research mainly uses Stata, is Python necessary? A: It depends on your research needs. If you only do traditional econometrics (OLS/fixed effects), Stata is sufficient. But if you work with text analysis, machine learning, or big data, Python is essential.

Q: How long before I can use Python for research? A: Basic syntax (2 weeks) → Data processing (2 weeks) → Regression analysis (1 week) → Can do simple research (5 weeks). Deep mastery requires 3-6 months.


Next Steps

After completing this module, you will:

  • Have clear motivation for learning Python
  • Understand Python's position in the research workflow
  • Successfully run your first data analysis program
  • Be well-prepared for deeper learning

In Module 2, we will configure development environments (Jupyter, VS Code, Colab) and choose the programming tool that suits you best.

In Module 3, we will systematically learn Python syntax, from variables and data types to loops and conditionals.

Ready? Let's begin this exciting programming journey!


Released under the MIT License. Content © Author.